Forum Discussion
Re: Understanding Azure OpenAI Service Provisioned Reservations
Hi Sachins
OpenAI Service Provisioned Throughput Units (PTUs) are designed to provide predictable costs and performance by allocating a specific throughput capacity. There isn't a specific token quota limit mentioned for finetuned model deployments under PTUs, but the throughput is measured in Tokens Per Minute (TPM).
You can use the Azure capacity planner to estimate the required PTUs based on your expected TPM usage.
https://azure.microsoft.com/en-us/pricing/details/cognitive-services/openai-service/
You can deploy multiple finetuned models under the Provisioned capacity plan. The pricing will be affected by the total throughput required for all models combined. Each model's deployment will consume a portion of the allocated PTUs, and you need to ensure that the total TPM usage across all models does not exceed the provisioned capacity.
could you share more about your expected usage patterns? and how many models do you plan to deploy?
1 Reply
- ml4uBrass Contributor
Great explanation, Abdulrhman. To add, when deploying multiple finetuned models under the Provisioned capacity plan, it's important to monitor the total throughput to ensure it aligns with the allocated PTUs. This helps avoid exceeding the provisioned capacity and ensures optimal performance. Keep in mind that each model's deployment will contribute to the overall throughput usage, so planning and monitoring are key.