Unlock Cost Savings with Azure AI Foundry Provisioned Throughput reservations

Microsoft

May 19, 2025

In the ever-evolving world of artificial intelligence, businesses are constantly seeking ways to optimize their costs and streamline their operations while leveraging cutting-edge technologies. To help, Microsoft recently announced Azure AI Foundry Provisioned Throughput reservations which provide an innovative solution to achieve both. This offering is coming soon and will enable organizations to save significantly on their AI deployments by committing to specific throughput usage. Here’s a high-level look at what this offer is, how it works, and the benefits it brings.

What are Azure AI Foundry Provisioned Throughput reservations?

Prior to this announcement, Azure reservations could only apply to AI workloads running Azure OpenAI Service models. These Azure reservations were called “Azure OpenAI Service Provisioned reservations”. Now that more models are available on Azure AI Foundry and Azure reservations can apply to these models, Microsoft launched “Azure AI Foundry Provisioned Throughput reservations”.

Azure AI Foundry Provisioned Throughput reservations is a strategic pricing offer for businesses using Provisioned Throughput Units (PTUs) to deploy AI models. Reservations enable businesses to reduce AI workload costs on predictable consumption patterns by locking in significant discounts compared to hourly pay-as-you-go pricing.

How It Works

The concept is simple yet powerful: instead of paying the PTU hourly rate for your AI model deployments, you pre-purchase a set quantity of PTUs for a specific term—either one month or one year in a specific region and deployment to receive a discounted price. The reservation applies to the deployment type (e.g., Global, Data Zone, or Regional*), and region. Azure AI Foundry Provisioned Throughput reservations are not model dependent, meaning that you do not have to commit to a model when purchasing.

For example, if you deploy 3 Global PTUs in East US, you can purchase 3 Global PTU reservations in East US to significantly reduce your costs. It’s important to note that reservations are tied to deployment types and region, meaning a Global reservation won’t apply to Data Zone or Regional deployments and East US reservation won’t apply to West US deployments.

Key Benefits

Azure AI Foundry Provisioned Throughput reservations offer several benefits that make them an attractive option for organizations:

Cost Savings: By committing to a reservation, businesses can save up to 70% compared to hourly pricing***. This makes it an ideal choice for production workloads, large-scale deployments, and steady usage patterns.
Budget Control: Reservations are available for one-month or one-year terms, allowing organizations to align costs with their budget goals. Flexible terms ensure businesses can choose what works best for their financial planning.
Streamlined Billing: The reservation discount applies automatically to matching deployments, simplifying cost management and ensuring predictable expenditures.

How to Purchase a reservation

Purchasing an Azure AI Foundry Provisioned Throughput reservation is straightforward:

Sign in to the Azure Portal and navigate to the Reservations section.
Select the scope you want the reservation to apply to (shared, management group, single subscription, single resource group)
Select the deployment type (Global, Data Zone, or Regional) and the Azure region you want to cover.
Specify the quantity of PTUs and the term (one month or one year).
Add the reservation to your cart and complete the purchase.

Reservations can be paid for upfront or through monthly payments, depending on your subscription type. The reservation begins immediately upon purchase and applies to any deployments matching the reservation's attributes.

Best Practices

Important: Azure reservations are NOT deployments —they are entirely related to billing. The Azure reservation itself doesn’t guarantee capacity, and capacity availability is very dynamic.

To maximize the value of your reservation, follow these best practices:

Deploy First: Create your deployments before purchasing a reservation to ensure you don’t overcommit to PTUs you may not use.
Match Deployment Attributes: Ensure the scope, region, and deployment type of your reservation align with your actual deployments.
Plan for Renewal: Reservations can be set to auto-renew, ensuring continuous cost savings without service interruptions.
Monitor and manage: Post purchase of reservations it is important to regularly monitor your reservation utilization and setup budget alerts.
Exchange reservations: Exchange your reservations if your workloads change throughout your term.

Why Choose Azure AI Foundry Provisioned Throughput reservations?

Azure AI Foundry Provisioned Throughput reservations are a perfect blend of cost efficiency and flexibility. Whether you’re deploying AI models for real-time processing, large-scale data transformations, or enterprise applications, this offering helps you reduce costs while maintaining high performance. By committing to a reservation, you can not only save money but also streamline your billing and gain better control over your AI expenses.

Conclusion

As businesses continue to adopt AI technologies, managing costs becomes a critical factor in ensuring scalability and success. Azure AI Foundry Provisioned Throughput reservations empower organizations to achieve their AI goals without breaking the bank. By aligning your workload requirements with this innovative offer, you can unlock significant savings while maintaining the flexibility and capabilities needed to drive innovation.

Ready to get started? Learn more about Azure reservations and be on the lookout for Azure AI Foundry Provisioned Throughput reservations to be available to purchase in your Azure portal and get started with

Additional Resources:

What are Azure Reservations? - Microsoft Cost Management | Microsoft Learn

Azure Pricing Overview | Microsoft Azure

Azure Essentials | Microsoft Azure

Azure AI Foundry | Microsoft Azure

*Not all models will be available regionally.

**Not all models will be available for Azure AI Foundry Provisioned Throughput reservations.

*** The 70% savings is based on the GPT-4o Global provisioned throughput Azure hourly rate of approximately $1/hour, compared to the reduced rate of a 1-year Azure reservation at approximately $0.3027/hour. Azure pricing as of May 1, 2025 (prices subject to change. Actual savings may vary depending on the specific Large Language Model and region availability.)

Updated May 16, 2025

Version 1.0

cost management

optimize resources

kyleikeda

Microsoft

Joined October 28, 2020

View Profile

FinOps Blog

Follow this blog board to get notified when there's new activity