Introduction to Message Quota of Azure OpenAI Service

Brass Contributor

Generated using Microsoft DesignerGenerated using Microsoft Designer

 

Azure OpenAI's quota feature enables assignment of rate limits to your deployments, up-to a global limit called your “quota.” Quota is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). When you onboard a subscription to Azure OpenAI, you'll receive default quota for most available models. Then, you'll assign TPM to each deployment as it is created, and the available quota for that model will be reduced by that amount. You can continue to create deployments and assign them TPM until you reach your quota limit. Once that happens, you can only create new deployments of that model by reducing the TPM assigned to other deployments of the same model (thus freeing TPM for use), or by requesting and being approved for a model quota increase in the desired region.

quota.png
 
  • Quota is assigned to your subscription on a per-region, per-model basis in units of Tokens-per-Minute (TPM). 
  • Assign Tokens-Per-Minute (TPM) while creating a deployment.
  • TPM can be modified in increments of 1,000.
 

OpenAI Quotas.png

View and request quota

 

Quota Name: The quota name can be expanded in the UI to show the deployments that are using the quota.

Deployment: Model deployments divided by model class.

Usage/Limit: This amount of quota used is also represented in the bar graph.

Request Quota: The icon in this field navigates to a form where requests to increase quota can be submitted.

Migrating existing deployments.

0 Replies