Forum Discussion

sachins's avatar
sachins
Copper Contributor
Mar 28, 2025
Solved

Understanding Azure OpenAI Service Provisioned Reservations

Hello Team,   We are building a Azure OpenAI based finetuned model making use of GPT 4o-mini for long run. Wanted to understand the costing, here we came up with the following question over Azure ...
  • ml4u's avatar
    Apr 21, 2025

    Provisioned Throughput Units (PTUs) in Azure OpenAI Service provide predictable costs and performance by allocating a specific throughput capacity. The token quota limit for fine-tuned models is typically measured in Tokens Per Minute (TPM) rather than a fixed number of tokens. As for deploying multiple fine-tuned models, the pricing is based on the total throughput capacity allocated. You can deploy multiple models as long as the combined throughput remains within the reserved capacity. It's advisable to monitor usage and adjust the capacity as needed to optimize costs.

Resources