As the use of AI grows, organizations will start to question the return on investment. Learn how to manage your AI costs with foundational FinOps principles.
Artificial intelligence is well and truly here, though the rate of adoption varies across consumers and businesses. When we think of AI in relation to FinOps, we look at it from two different angles: how are we using FinOps to optimize our AI costs, and how are we using AI to optimize all of our costs. The second angle includes the intersection of our cost management tools and AI capabilities, including things like Azure Copilot answering cost and forecasting questions. 
The first angle is only just starting to be whispered about in corporate hallways. Many organizations are still curious about AI and are exploring its capabilities without too much concern about cost or return on investment. They want to see if they can apply this new component to new or existing applications or to business process, to prove whether it’s beneficial. But early adopters, and savvy organizations that have a mature FinOps practice, are looking at how their cloud cost management practices wrap around this new type of cloud workload. Let’s explore some of the foundational FinOps elements and how they apply to AI.
The FinOps Framework
During its establishment and subsequent updates, the FinOps Foundation’s FinOps Framework was always intended to expand beyond just consumption-based cloud workloads. We’re seeing the rise of this with the addition of Scopes to the Framework, including Software-as-a-Service and Data Centers. And while AI isn’t a new scope because it fits into Public Cloud (or Data Center if you are hosting your own models locally), AI does benefit from the original forward-thinking approach. It provides an acid-test to see if the Principles, Domains, etc. are applicable to this new kind of workload, both for AI capabilities that purely consumption based (e.g. Azure OpenAI language models) or are SKU based with a combination of size limitations plus scale unit hours (e.g. Azure AI Search).
Principles, Core Personas, Allied Persona, Phases and Maturity all ring true when looked at from the perspective of your AI workloads. You may add a few specialized personas when you go down to a greater level of detail under Engineers – like data and prompt engineers. Learn more in the FinOps Foundation’s FinOps for AI Overview.
AI does not negate the need for teams to collaborate, everyone to take ownership of their cloud usage, and for business value to drive decision-making. The Domains and Capabilities are where we drill down into the next level of detail and examine if our existing tools and processes are up to the task of managing AI workloads.
The FinOps Foundation's FinOps FrameworkMicrosoft takes a holistic approach to adopting new workloads, reflected in our Cloud Adoption Framework and Well Architected Framework. These have been updated to include AI adoption and AI Workload Documentation.
For AI adoption, our Azure Essentials guidance introduces stages of Readiness and foundation, Design & govern, and Manage and optimize – combining detailed best practices and tools relevant to your AI adoption journey.
FinOps capabilities for adopting AI on Azure
Combine Microsoft's guidance and the FinOps Framework, and you can drill down into many of the FinOps capabilities during the stages of adopting AI on Azure:
Readiness & foundation
Planning & estimating – Planning for and estimating the cost of an AI workload requires an understanding of the entire application architecture, driven by the business requirements. Like every other application you manage, your organization will have requirements regarding security, resiliency & redundancy, recovery time and recovery points. These may be less rigorous if your application is stateless and not holding long-term data, but the architecture may need to be resilient if the application will become mission-critical or customer facing.
Next, it’s important to understand how AI services are priced. Most text-based AI services using Large Language Models (LLMs), including the Azure OpenAI Service, are priced per 1,000 tokens (where a token is a common sequence of characters found in the text). OpenAI’s Tokenizer website can demonstrate how text is broken into tokens, with efficiency improvements already being seen from GPT-3 to GPT-4o. Calculating the number of tokens in a sample conversation (inputs) & multiplying that by the number of website visitors x the percentage that may interact with a chatbot, can give us an indication of token usage and predicted cost.
Workload optimization – As well as ensuring that all the components of the application are right-sized, consider whether you are using the right AI models, pre-built models or more efficient newer models. Also investigate ways that your application design can optimize token usage in how it caches responses (including semantic caching) and sends summaries instead of entire conversation history.
Budgeting & forecasting – Good budgeting & forecasting is built on understanding current usage and adjusting for future expectations. In a future blog post, we’ll dive into how to measure and analyze AI workload usage. Once you understand the specifics of AI in your cost management tooling, the regular habit of budget reviews and forecasting adjustments is part of your overall FinOps processes.
Empower teams & foster collaboration – Just as you’re learning how AI services generate cost and how to analyze that data, ensure your teams understand this at the right level for them. Engineers might use these insights to make application design changes, while Finance teams might care less about tokens but could equate that to customer usage and hopefully a parallel increase in sales.
Design & govern
Establish AI policy & governance – It’s easy to think of this one in terms of privacy & security, but what other aspects of AI do you need to put rules around, especially in relation to cost? Are all engineers allowed to deploy all AI workloads? Is AI restricted to development subscriptions until token usage is proven with a proof of concept? Are there any relevant in-built or customer Azure Policies that you need to implement? Are self-hosted open source LLMs allowed in your organization?
Integrate intersecting disciplines like LLMOps – While we think of Engineers as the traditional personas who will develop and maintain an application, AI introduces specialized disciplines like LLMOps. Machine Learning Ops for Large Language Models (LLMOps) includes the automation of repetitive tasks, such as model building, testing, deployment, and monitoring, which improves efficiency. Though LLMs are pre-trained, MLOps can be leveraged to tune the LLMs, operationalize and monitor them effectively in production. These improvements can lead to cost reduction.
Take advantage of the variable cost model of the cloud – A key part of this is understanding how your AI service is charged, and whether run times will impact cost or not. Fine tuning models have an hourly cost in addition to a token usage rate, charged from deployment onwards, so be mindful of when you deploy them and delete them when they are no longer needed.
Implement the “crawl, walk, run” approach for continuous improvement – This one is self-explanatory and even more applicable as your organization adopts AI workloads. You will start out with a basic understanding and rudimentary decisions, which should be reviewed as your AI maturity increases.
Manage and optimize
Leverage reporting and analytics – Identify how AI usage and costs are surfaced in your existing cost management tools. We’ll cover this from a Microsoft Azure perspective in a future blog post.
Anomaly management – As for any application, understand your process or tools for detecting anomalies in usage, and what steps should be taken next.
Drive accountability – Emphasize to technical and engineering teams that their design decisions have cost implications, and that they have also have the power to help identify and implement cost efficiencies.
Rate optimization – This responsibility should lie with your FinOps team and includes both the pricing rates for your organization (for example, via an Enterprise Agreement) as well as rate discounts using Azure Reservations, which also apply to AI services using Provisioned Throughput Units.
Sustainability – An important consideration for corporate responsibility, especially if your organization has Environmental, Social & Governance (ESG) goals or reporting requirements. Leverage the Azure Carbon Optimization reports for emissions data and more.
Unit economics – Unit economics breaks down into understanding the true cost of an application and correlating that with the business metric of the desired outcome (e.g. revenue per visit). From an application cost perspective, consider capabilities like Azure API Gateway’s Azure OpenAI Token Metric policy, which collects token usage data and facilitates accurate cross-charging based on token consumption. Then explore the outcome of your AI solution to identify business value indicators. These may be easier to measure if you’ve integrated AI with your e-commerce site, but a little trickier if it’s an internal chatbot that is improving business productivity.
Conclusion
If you have an established FinOps practice, you’ll find many of the process and capabilities are applicable to adopting and managing AI workloads on Azure.
If you haven’t explored FinOps in depth yet, new AI workloads may be the catalyst for establishing a FinOps practice, but these foundational principles will also benefit your entire cloud, SaaS and data center estate.
Learn how to navigate the financial landscape for successful AI adoption, including security funding, establishing your organizational readiness and managing your AI investments.