Forum Discussion
Understanding Azure OpenAI Service Provisioned Reservations
- Apr 21, 2025
Provisioned Throughput Units (PTUs) in Azure OpenAI Service provide predictable costs and performance by allocating a specific throughput capacity. The token quota limit for fine-tuned models is typically measured in Tokens Per Minute (TPM) rather than a fixed number of tokens. As for deploying multiple fine-tuned models, the pricing is based on the total throughput capacity allocated. You can deploy multiple models as long as the combined throughput remains within the reserved capacity. It's advisable to monitor usage and adjust the capacity as needed to optimize costs.
Provisioned Throughput Units (PTUs) in Azure OpenAI Service provide predictable costs and performance by allocating a specific throughput capacity. The token quota limit for fine-tuned models is typically measured in Tokens Per Minute (TPM) rather than a fixed number of tokens. As for deploying multiple fine-tuned models, the pricing is based on the total throughput capacity allocated. You can deploy multiple models as long as the combined throughput remains within the reserved capacity. It's advisable to monitor usage and adjust the capacity as needed to optimize costs.
- sachinsApr 22, 2025Copper Contributor
If we are doing it on test server on different subscription for client and want to transfer/share to main subscription, is it possible, on same region?