Understanding AI workload cost considerations

Microsoft

Apr 11, 2025

Like all good pricing questions, if you ask how much AI will cost, the answer is “it depends.” Assuming you are building an AI-enabled application, your design decisions will influence that cost - including both traditional architecture components and your choice of AI models. This article looks at application architectures, tokens, and carbon emissions.

What does an AI application look like?

When thinking about an AI application, AI is just one component. Good architecture design starts with clear requirements and right-sizing the entire application stack to meet business needs. If your application is mission critical and/or customer facing, you’re likely to incur additional costs from design choices that improve resiliency, availability and redundancy, including geo-redundancy, load balancing etc.

Example 1: This application provides a front end for customers and store admins, REST APIs for sending data to RabbitMQ message queue and MongoDB database, and console apps to simulate traffic. https://learn.microsoft.com/azure/aks/open-ai-quickstart

Example 2: A reference .NET application implementing an e-commerce website using a services-based architecture with .NET Aspire. https://github.com/dotnet/eshop
https://github.com/Azure-Samples/eShopOnAzure

Example 3: An application that provides text transcripts from podcast audio files. https://learn.microsoft.com/azure/architecture/ai-ml/idea/process-audio-files

You might be familiar with how architecture components like AKS, PostgreSQL, Azure Functions or Azure Storage are priced, but you’ll need to add estimated costs for your AI services. This starts with understanding how AI services are charged.

AI service pricing structures

Azure OpenAI Service – pricing examples Apr 2025 USD

The first thing to note is that AI models usually work on a cost basis related to the number of “tokens” that are used. Tokens are the billing meter of AI models. I say “usually”, because text-to-speech models are based on the number of characters processed. Fine-tuning models also have hourly rates, from the moment of deployment, with no way to suspend them/power them down like you can a virtual machine.

Tokens are the way an AI model interprets characters, of both input (i.e. prompts) and output (responses). We use tokens because they are consistent, whereas the conversion from characters to token varies. Newer models are more efficient at breaking words into tokens, for example recognising “I’m” as one token instead of two “I” and “’m”. Token counts also vary between different languages (e.g. English or Spanish). AI models don’t see words like humans do but look at common sequences of characters and predict the next token in a sequence.

OpenAI provides a tokenizer where you can insert a sample of text, and it will convert it to tokens. In the example video below, we’ve generated a sample of a conversation that someone might have with a website chatbot regarding products for hiking.

By generating a token number, we could multiple that by how many current website users we have, how many we think may interact with a chatbot, and a little room for growth. Note: this math is just based on input tokens – the characters submitted by the human, not the AI response (which is charged as output tokens). It’s advantageous to optimize the number of tokens processed by the AI service, and we’ll discuss that in more detail in a later post.

That’s not the only AI service that Microsoft offers though. As another example, Azure AI Search has a more familiar SKU approach, with SKUs having a combination of storage, indexes and scale out limits, at an hourly rate per scale unit:

Azure AI Search – Pricing example Apr 2025 USD

Additional pricing for Custom entity lookup skill, Document cracking (image extraction), Semantic ranker.

Architecture, usage and cost considerations

For all workloads, including AI, the FinOps Framework capabilities of architecting for the cloud and planning & estimating are key. This is a combination of understanding architectures & usage patterns, understanding the impacts of architecture change, and defining the scope and requirements of your estimates. You can’t figure out how much AI will cost if you don’t understand how it is billed. And you may make design decisions that are not cost optimized if you don’t consider how to optimize or cache tokens (input/output).

It also helps to accurately estimate how much your AI workload will be used, but especially if AI hasn’t been adopted in your organization yet, this may require a smaller launch group or proof of concept to test your usage assumptions.

Carbon Emissions

Whether your organization has formal Environment, Social and Governance (ESG) reporting requirements or not, most people understand that AI services need significant computing power. That’s not something that’s visible on a pricing sheet. The FinOps Foundation includes guidelines for considering sustainability, setting targets and making data visible. And now the Azure Carbon Optimization reporting in the Azure Portal allows you to understand in depth the emissions resulting from your use of Microsoft Azure services.

Learn more at aka.ms/AzureCarbonOptimization or try our interactive guide on the Azure Carbon Optimization tool.

Coming up next

Next time, we’ll explore cost controls for AI services - what exists to help us limit our AI costs, or get more visibility of them at a granular level?