Understanding Azure OpenAI Service Provisioned Reservations

Hello Team, We are building a Azure OpenAI based finetuned model making use of GPT 4o-mini for long run. Wanted to understand the costing, here we came up with the following question over Azure ...

GPT-4o

Provisioned Throughput

ml4u
Apr 20, 2025
Provisioned Throughput Units (PTUs) in Azure OpenAI Service provide predictable costs and performance by allocating a specific throughput capacity. The token quota limit for fine-tuned models is typically measured in Tokens Per Minute (TPM) rather than a fixed number of tokens. As for deploying multiple fine-tuned models, the pricing is based on the total throughput capacity allocated. You can deploy multiple models as long as the combined throughput remains within the reserved capacity. It's advisable to monitor usage and adjust the capacity as needed to optimize costs.

ml4u

Brass Contributor

Apr 20, 2025

Provisioned Throughput Units (PTUs) in Azure OpenAI Service provide predictable costs and performance by allocating a specific throughput capacity. The token quota limit for fine-tuned models is typically measured in Tokens Per Minute (TPM) rather than a fixed number of tokens. As for deploying multiple fine-tuned models, the pricing is based on the total throughput capacity allocated. You can deploy multiple models as long as the combined throughput remains within the reserved capacity. It's advisable to monitor usage and adjust the capacity as needed to optimize costs.

sachins
Copper Contributor
Apr 22, 2025
If we are doing it on test server on different subscription for client and want to transfer/share to main subscription, is it possible, on same region?

Forum Discussion

Understanding Azure OpenAI Service Provisioned Reservations