Blog Post

FinOps Blog
6 MIN READ

Optimize AI costs by choosing the right Azure OpenAI pricing offer for you

kyleikeda's avatar
kyleikeda
Icon for Microsoft rankMicrosoft
Nov 08, 2024

As AI services become integral to business operations, understanding how to keep costs in check and maintain reliable performance of these services is crucial. Azure OpenAI Service empowers businesses to harness advanced AI offers, including GPT, for transformative solutions, delivering unparalleled efficiency, scalability, and security within the trusted Azure ecosystem. The platform offers two primary pricing offers: Standard (pay-per-token) and Provisioned (provisioned throughput unit, or PTUs). Azure OpenAI Service provisioned reservations are another way for customers to optimize their spend. Each offer is designed to address different use cases and workloads, making it essential for businesses to choose the right one for their specific needs.

This blog will explore the differences among these offerings and examine when it would be most beneficial for your business to use which one. Along the way we’ll follow a fictional company on their own hypothetical journey to optimize AI costs for maximum performance and reliability.

Understanding token pricing and throughput costs

In these pricing offers, costs are primarily based on the number of tokens processed, which includes both input and output. A token can represent a character, a word, or even a part of a word, depending on the complexity of the request.

The total number of tokens processed in a request depends on the length of your input, output, and request parameters. The quantity of tokens being processed will also affect your response latency and throughput for the models. For applications processing high volumes of data, understanding token usage is critical for managing costs effectively.

Standard Pricing: Flexible AI power for scalable solutions

Standard Pricing, also known as the "pay-per-token" offer, charges users based on the number of tokens processed. With the flexible nature of token-based pricing, this offer is ideal for businesses or developers that have fluctuating workloads or are still in the development phase of their projects.

Let’s explore a fictional company, Contoso Solutions, that specializes in building AI-powered chatbots for businesses. They are tasked with developing a new customer service chatbot for a large retail client.

Phase 1: Development and testing

During the development and testing phase, Contoso Solutions is focused on refining the chatbot’s AI offer and integrating it with the client’s customer service systems. At this stage, usage is highly variable—making it ideal for low to medium volume workloads—as the team runs multiple test scenarios, adjusts responses, and works through integration challenges. Given the unpredictable nature of their usage, they opt for Standard Pricing. This allows them to experiment freely and manage costs efficiently, as they are only charged for the tokens processed during testing.

The flexibility of Standard Pricing enables Contoso Solutions to develop and test the chatbot without overcommitting to a fixed level of throughput. As their usage may fluctuate from day to day, this offer helps them maintain control over their cost while exploring different configurations and AI capabilities.

Provisioned (PTU) Pricing: Resilient AI performance with predictable costs

With Azure OpenAI Service, standard pricing offers flexibility through pay-per-token billing, making it ideal for short-term or dynamic needs—but some workloads require predictable throughput and minimal latency. Alternatively, PTUs provide a predictable pricing model where you reserve and deploy a specific amount of model processing capacity. PTUs allow you to specify the amount of throughput required in your Azure OpenAI deployment. The service then allocates the necessary model processing capacity and ensures it's ready for you. Businesses with consistent or predictable workloads benefit from this offer as they ensure predictable performance and reserved processing capacity.

Provisioned reservations: Optimize AI budget with Azure reservations

After deploying PTUs in your environment you might be looking for ways to optimize cost. With Azure OpenAI Service provisioned reservations you can save up to 85%* on your provisioned deployments. By committing to a specific number of PTUs in a specific region for periods like one month or one year, you get cost savings compared to PTU hourly pricing and can apply these savings to all models. Customers can purchase provisioned reservations in their Azure portal with Microsoft Cost Management and get purchase recommendations based on their usage with Azure Advisor. Provisioned reservations are ideal for PTUs that have consistent long-term usage.

Key Benefits of provisioned reservations

  • Cost savings: Significantly save on PTUs as compared to hourly pricing.
  • Commit on your terms: Utilize one-month or one-year terms to support budget goals.
  • Agile and simple: Savings are applied automatically to provisioned units that are deployed in the region and reservation scope. Reservations are not model specific and will cover deployments for all model types.

Phase 2: Production deployment

Once their chatbot is ready for production, Contoso Solutions expects consistent demand, particularly during peak shopping seasons. Based on historical customer service data, they anticipate a steady flow of customer inquiries, especially during busy periods like holidays or sales events. To ensure the chatbot performs reliably and provides quick and accurate responses they deploy PTUs. To get additional savings on their PTUs they purchase provisioned reservations on a 1-year commitment.

By committing to a specific level of throughput, Contoso Solutions can guarantee the chatbot’s performance during peak times, ensuring low latency and minimal response delays. PTUs provides the stability and cost predictability needed for production environments, allowing the company to accurately forecast its expenses and ensure a seamless customer experience. Additionally, provisioned reservations enable Contoso to save on costs without risking performance issues.

Need help choosing? We’ve got your back

Deciding between Standard and Provisioned offerings and when to purchase provisioned reservations depends on your application’s stage and usage patterns. We offer an array of tools and learning resources to assist you in making this important decision.

Leverage the Azure OpenAI Service pricing page and Azure Pricing calculator to dynamically compare cost and budget spend based on your inputs.

The Azure OpenAI Service capacity calculator helps estimate the required amount of PTUs to meet the needs of your workload based off specific workload details, such peak calls per minute, tokens in a prompt call, and tokens in model response. The calculator ensures predictable performance by accurately forecasting the necessary capacity, optimizing costs by avoiding over- or under-provisioning, and simplifying the complex conversion from call characteristics to PTUs. You can also benchmark your deployments based on traffic, providing a structured and efficient way to plan your capacity and ensure you have the right resources to support their applications.

Azure Advisor provides cost recommendations to help customers optimize their cloud spend by identifying underutilized resources and suggesting reservations that can lead to cost savings. By analyzing usage patterns and workloads, Azure Advisor offers tailored recommendations for purchasing provisioned reservations.

We also have a dedicated skilling module at Microsoft Learn to further help you understand the difference each offer and when to use them. You’ll explore the benefits, use cases, and implementation steps to estimate, deploy and purchase PTUs for Azure OpenAI Service, and how to manage and monitor your provisioned reservations.

Start making informed decisions for your Azure OpenAI journey

In the case of Contoso Solutions, starting with Standard Pricing during development and transitioning to PTUs for production and purchasing provisioned reservations enables them to optimize both performance and cost-effectiveness. By understanding the strengths of each offer, your business can make informed decisions that align with their operational needs and financial goals. With Azure OpenAI Service, you have the tools to tailor your pricing offer to suit the unique demands of your AI-driven applications.

Get started today by visiting the Azure OpenAI Service homepage and watch how to deploy your provisioned deployments in Azure.

Additional Resources:

Understanding Azure OpenAI Service deployment types - Azure AI services | Microsoft Learn

Azure OpenAI Service Provisioned Throughput Units (PTU) onboarding - Azure AI services | Microsoft Learn

What are Azure Reservations? - Microsoft Cost Management | Microsoft Learn

Azure Pricing Overview | Microsoft Azure

*The 85% savings are based on the Provisioned Throughput Hourly rate of approximately $2/hour, compared to the reduced rate of a 1-year reservation at approximately $0.3028/hour. Azure pricing as of July 29, 2024 (prices subject to change). Actual savings may vary depending on the specific Azure OpenAI model and region availability.

 

Updated Nov 08, 2024
Version 1.0
No CommentsBe the first to comment