Blog Post

Azure Integration Services Blog
2 MIN READ

GPT-4o Support and New Token Management Feature in Azure API Management

akamenev's avatar
akamenev
Icon for Microsoft rankMicrosoft
Nov 19, 2024

We’re happy to announce new features coming to Azure API Management enhancing your experience with GenAI APIs. Our latest release brings expanded support for GPT-4 models, including text and image-based input, across all GenAI Gateway capabilities. Additionally, we’re expanding our token limit policy with a token quota capability to give you even more control over your token consumption.

Token quota

This extension of the token limit policy is designed to help you manage token consumption more effectively when working with large language models (LLMs).

Key benefits of token quota:

  1. Flexible quotas: In addition to rate limiting, set token quotas on an hourly, daily, weekly, or monthly basis to manage token consumption across clients, departments or projects.
  2. Cost management: Protect your organization from unexpected token usage costs by aligning quotas with your budget and resource allocation.
  3. Enhanced visibility: In combination with emit-token-metric policy, track and analyze token usage patterns to make informed adjustments based on real usage trends.

With this new capability, you can empower your developers to innovate while maintaining control over consumption and costs. It’s the perfect balance of flexibility and responsible consumption for your AI projects. Learn more about token quota in our documentation.

GPT4o support

GPT-4o integrates text and images in a single model, enabling it to handle multiple content types simultaneously. Our latest release enables you take advantage of the full power of GPT-4o with expanded support across all GenAI Gateway capabilities in Azure API Management.

 

Key benefits:

  • Cost efficiency: Control and attribute costs with token monitoring, limits, and quotas. Return cached responses for semantically similar prompts.
  • High reliability: Enable geo-redundancy and automatic failovers with load balancing and circuit breakers.
  • Developer enablement: Replace custom backend code with built-in policies. Publish AI APIs for consumption.
  • Enhanced governance and monitoring: Centralize monitoring and logs for your AI APIs.

Phased rollout and availability

We’re excited about these new features and want to ensure you have the most up-to-date information about their availability. As with any major update, we’re implementing a phased rollout strategy to ensure safe deployment across our global infrastructure. Because of that some of your services may not have these updates until the deployment is complete. These new features will be available first in the new SKUv2 of Azure API Management followed by SKUv1 rollout towards the end of 2024. 

Conclusion

These new features in Azure API Management represent our step forward in managing and governing your use of GPT4o and other LLMs. By providing greater control, visibility and traffic management capabilities, we’re helping you unlock the full potential of Generative AI while keeping resource usage in check.

We’re excited about the possibilities these new features bring and are committed to expanding their availability. As we continue our phased rollout, we appreciate your patience and encourage you to keep an eye out for the updates.

Updated Nov 18, 2024
Version 1.0
No CommentsBe the first to comment