Blog Post

Azure AI Foundry Blog
3 MIN READ

Best Practices for Requesting Quota Increase for Azure OpenAI Models

ellienosrat's avatar
ellienosrat
Icon for Microsoft rankMicrosoft
Mar 26, 2025

Introduction

This document outlines a set of best practices to guide users in submitting quota increase requests for Azure OpenAI models. Following these recommendations will help streamline the process, ensure proper documentation, and improve the likelihood of a successful request. 

Understand the Quota and Limitations

Before submitting a quota increase request, make sure you have a clear understanding of:

  • The current quota and limits for your Azure OpenAI instance.
  • Your specific use case requirements, including estimated daily/weekly/monthly usage.
  • The rate limits for API calls and how they affect your solution's performance.

Use the Azure portal or CLI to monitor your current usage and identify patterns that justify the need for a quota increase.

Provide a Clear and Detailed Justification

When requesting a quota increase, include a well-documented justification. This should include:

Use Case Description

Provide an overview of how you are using Azure OpenAI services, including details about the application or platform it supports.

User Impact

Explain how the quota increase will benefit end-users or improve the solution's performance. Provide details of Azure Consumption impact, if possible.

Current Limits vs. Required Limits

Clearly state your current quota and the requested increase. For example:

Current limit: 10,000 tokens per minute
Requested limit: 50,000 tokens per minute

Supporting Data

Include quantitative data such as:

  • Historical usage trends
  • Growth projections (e.g., user base, API calls, or token consumption)
  • Expected peak usage scenarios (e.g., seasonal demand or events)

Follow Prompt Engineering Best Practices

As a first step, ensure you are optimizing your usage of Azure OpenAI models by adhering to prompt engineering best practices:

  • Plain Text Over Complex Formats – Use simple, clear prompts to minimize errors and improve model efficiency.
  • JSON Format (if applicable) – If structured data is required, experiment with lightweight and efficient JSON schemas.
  • Clear Instructions – Provide concise and unambiguous instructions in your prompts to reduce unnecessary token consumption.
  • Iterative Refinement – Continuously refine prompts to strike a balance between response quality and token usage.

Optimize API Usage

Ensure that your solution is designed to optimize API usage:

  • Batch Requests/Processes – Combine multiple smaller requests into a single larger request where possible, or use the batch endpoint option in Azure OpenAI.
  • Rate-Limit Handling – Implement robust retry mechanisms and backoff strategies to handle rate-limited responses gracefully.
  • Caching Responses – Cache frequently requested results to minimize redundant API calls.

Include Architectural and Operational Details

If your request involves high usage or complex architecture, provide additional details:

  • System Architecture Overview – Describe the architecture of your solution, including how Azure OpenAI services integrate with other platforms (e.g., AWS, on-premises systems, etc.).
  • Rate Limiting Strategy – Highlight any strategies in place to manage rate limits, such as prioritizing critical requests or implementing queuing mechanisms.
  • Monitoring and Alerts – Share your approach to monitoring API usage and setting up alerts for potential issues.

Submit with Supporting Documentation

Prepare and attach any supporting materials that can strengthen your case:

  • Usage Reports – Include graphs, charts, or tables showing current consumption trends.
  • Excel Calculations – Provide detailed calculations related to rate limit requirements, if applicable.
  • Screenshots or Diagrams – Share visuals that demonstrate your use case or architectural setup.
  • Concise and Structured Format – Use bullet points or numbered lists for clarity.
  • Alignment with Business Goals – Connect the request to business outcomes, such as improved customer experience or scalability for growth.
  • Free of Ambiguity – Avoid vague language; provide specific details wherever possible.

 

Additional AI Best Practices blog posts:

Best Practices for Leveraging Azure OpenAI in Code Conversion Scenarios

Best Practices for Leveraging Azure OpenAI in Constrained Optimization Scenarios

Best Practices for Structured Extraction from Documents Using Azure OpenAI

Updated Mar 28, 2025
Version 3.0

6 Comments

  • crodriguez1's avatar
    crodriguez1
    Brass Contributor

    The last step of the increase quota form requires a value in specified format that it's not clear to me at all. For example for gpt4-o I want the default 450k TPM that grants 2.7k RPM.  What I'm suppose to put in the textbox?

     

    • ellienosrat's avatar
      ellienosrat
      Icon for Microsoft rankMicrosoft

      Hi, for gpt40 you will just go with the default ratio, which means for 450K TPM, you should enter 450 in the text field. 

  • Raiya's avatar
    Raiya
    Copper Contributor

    Hello Ellie,

    I am Raiya, from an Azure CSP partner company.

    We have been experiencing cases where requests for a quota increase for Azure OpenAI models are not being approved, so we appreciate you publishing this article.

    I have one question.

    Are there any cases where requests that were previously rejected were later approved after submitting the documents described in this article to the contact point?

    Thank you!

    • crodriguez1's avatar
      crodriguez1
      Brass Contributor

      (sorry posted as reply instead as new comment, I've "moved" it)

    • ellienosrat's avatar
      ellienosrat
      Icon for Microsoft rankMicrosoft

      Hi Raiya, do you have a Microsoft team supporting you? If so, they can use our internal tooling to escalate this for you! If not, my recommendation would be to send another request on this providing the details above. 

      • Raiya's avatar
        Raiya
        Copper Contributor

        Hi Ellie,

        Thanks for your response.

        I do have a support team, but they mentioned they can’t answer questions about "Requesting Quota Increase for Azure OpenAI Models."
        So, I wanted to ask if you could share any information regarding the increase in approvals based on your article's knowledge.