Scaling AI adoption requires a unified control plane
As organizations scale generative AI adoption, they face growing complexity managing multiple AI providers, models, API formats, and rapid release cycles. Without a unified control plane, enterprises risk fragmented governance, inconsistent developer experiences, and uncontrolled AI consumption costs.
As an AI Gateway, Azure API Management enables organizations to implement centralized AI mediation, governance, and developer access control across AI services. This blog post introduces the Unified AI Gateway design pattern, a customer developed architecture pattern designed by Uniper, that builds on API Management’s policy extensibility to create a flexible and maintainable solution for managing AI services across providers, models, and environments. Uniper runs this pattern in production today to optimize AI governance and operational efficiency, enhance the developer experience, and manage AI costs.
Note: The Unified AI Gateway described in this post is a customer-implemented design pattern built using Azure API Management policy extensibility.
Customer spotlight: Uniper
Uniper is a leading European energy company with a global footprint, generating, trading, and delivering electricity and natural gas through a diverse portfolio spanning hydro, wind, solar, nuclear, and flexible thermal assets. With a strategy centered on accelerating the energy transition, Uniper provides reliable and innovative energy solutions that power industries, strengthen grids, and support communities across its core markets.
Committed to becoming one of Europe’s first AI-driven utilities, Uniper views artificial intelligence as a strategic cornerstone for future competitiveness, efficiency, and operational transformation. Building on a strong foundation of AI and machine-learning solutions—from plant optimization and predictive maintenance to advanced energy trading—Uniper is now scaling the adoption of generative AI (GenAI) across all business functions.
At Uniper, AI is not just a technology enhancer—it is a business imperative. The momentum for AI-driven transformation starts within Uniper’s business areas, with the technology organization enabling and empowering this evolution through responsible, value-focused AI deployment.
Enterprise challenges when scaling AI services
As Uniper expanded AI adoption, they encountered challenges common across enterprises implementing multi-model and multi-provider AI architectures:
- API growth and management overhead – Using a conventional REST/SOAP API definition approach, each combination of AI provider, model, API type, and version typically results in a separate API schema definition in API Management. As AI services evolve, the number of API definitions can grow significantly, increasing management overhead.
- Limited routing flexibility – Each API schema definition is typically linked to a static backend, which prevents dynamic routing decisions based on factors like model cost, capacity, or performance (e.g., routing to gpt-4.1-mini instead of gpt-4.1).
Because AI services evolve rapidly, this approach creates exponential growth in API definitions and ongoing management overhead:
- Separate APIs are typically needed for each of the following:
o AI service provider (e.g. Microsoft Foundry, Google Gemini)
o API type (e.g., OpenAI, Inference, Responses)
o Model (e.g., gpt-4.1, gpt-4.1-mini, phi-4)
- Each AI service also supports multiple versions. For instance, OpenAI might include:
o 2025-01-01-preview (latest features)
o 2024-10-21 (stable release)
o 2024-02-01 (legacy support)
- Different request patterns may be required. For example, Microsoft Foundry's OpenAI supports chat completion using both:
o OpenAI v1 format (/v1/chat/completions)
o Azure OpenAI format (/openai/deployments/{model}/chat/completions)
- Each API definition may be replicated across environments. For example, Development, Test, and Production API Management environments.
The Unified AI Gateway design pattern
To address these challenges, Uniper implemented a policy-driven enterprise AI mediation layer using Azure API Management.
At a high level, the pattern creates a single enterprise AI access layer that:
- Normalizes requests across providers and models
- Enforces consistent authentication and governance
- Dynamically routes traffic across AI services
- Provides centralized observability and cost controls
The design emphasizes modular policy components that provide centralized, auditable control over security, routing, quotas, and monitoring.
Core architecture components
The following components are involved in the Unified AI Gateway pattern:
- Single wildcard API definition with wildcard operations (/*) that minimizes API management overhead. No API definition changes are required when introducing new AI providers, models, or APIs.
- Unified authentication that enforces consistent authentication for every request, supporting both API key and JWT validation for inbound requests, with managed identity used for backend authentication to AI services.
- Optimized path construction that automatically transforms requests to simplify consuming AI services, such as automatic API version selection (for example, transforming /deployments/gpt-4.1-mini/chat/completions to /openai/deployments/gpt-4.1-mini/chat/completions?api-version=2025-01-01-preview).
- Model and API aware backend selection that dynamically routes requests to backend AI services and load balancing pools based on capacity, cost, performance, and other operational factors.
- Circuit breaker and load balancing that leverages API Management’s built-in circuit breaker functionality with load balancing pools to provide resiliency across backend AI services deployed in different regions. When endpoints reach failure thresholds, traffic automatically rebalances to healthy regional instances.
- Tiered token limiting that enforces token consumption using API Management’s llm-token-limit policy with quota thresholds.
- Comprehensive trace logging and monitoring using Application Insights to provide robust usage tracking and operational insights, including token tracking through API Management’s llm‑emit‑token‑metric policy.
"The collaboration between the Uniper and Microsoft’s AI and API Management teams on delivering the unified AI gateway has been exceptional. Together, we've built a robust solution that provides the flexibility to rapidly adapt to fast-paced advancements in the AI sphere, while maintaining the highest standards of security, resilience, and governance. This partnership has enabled us to deliver enterprise-grade AI solutions that our customers can trust and scale with confidence."
~ Ian Beeson – Uniper, API Centre of Excellence Lead
Uniper’s results: Business and operational impact
For Uniper, shifting to use the Unified AI Gateway pattern has proven to be a strategic enabler for scaling their AI adoption with API Management. Uniper reports significant improvements across governance, efficiency, developer experience, and cost management:
- Centralized AI security and governance
o Real-time content filtering – Uniper can detect, log, and alert on content filter violations.
o Centralized audit and traceability – All AI requests and responses are centrally logged, enabling unified auditing and tracing.
- Operational efficiency
o Reduction in API definitions – Uniper estimates an 85% API definition reduction, moving from managing seven API definitions per environment (Development, Test, and Production) to a single universal wildcard API definition per environment.
o Feature deployment speed – Uniper delivers AI capabilities 60–180 days faster, enabled by immediate feature availability and the elimination of reliance on API schema updates and migrations.
o AI service availability – Uniper achieves 99.99% availability for AI services, enabled through circuit breakers and multi‑regional backend routing.
o Centralized ownership and maintenance – API management responsibilities are now consolidated under a single team.
- Improved developer experience
o Immediate feature availability – New AI capabilities are available immediately without requiring API definition updates, eliminating the previous 2–6-month delay before new features could be shared with Uniper’s developers.
o Automatic API schema compatibility – Both Microsoft and third-party provider API updates no longer require migrations to new or updated API definitions. Previously, Uniper’s developers had to migrate for each update.
o Consistent API interface with equivalent SDK support – A unified API surface across all AI services simplifies development and integration for Uniper’s developers.
o Equivalent request performance – Uniper validated that request performance through the Unified AI Gateway pattern is equivalent to the conventional API definition approach, based on comparing the time a request is received by the gateway to the time it is sent to the backend.
- AI cost management
o Token consumption visibility – Uniper uses detailed usage and token level metrics to enable a charge‑back model.
o Automated cost controls – Uniper enforces costs through configurable quotas and limits at both the AI gateway and backend AI service levels.
o Optimized model routing – Uniper dynamically routes requests to the most cost-effective models based on their policy.
Uniper's results“The Unified AI Gateway pattern has fundamentally changed how we scale and govern AI across the enterprise. By consolidating AI access behind a single, policy-driven Azure API Management layer, we’ve reduced operational complexity while improving security, resilience, and developer experience. Most importantly, this approach allows us to adopt new models and capabilities at the pace the AI ecosystem demands—without compromising performance, availability, or governance.”
~ Hinesh Pankhania – Uniper, Head of Cloud Engineering & CCoE
When to use this pattern
The Unified AI Gateway pattern is most beneficial when organizations experience growing AI service complexity. Consider using the Unified AI Gateway pattern when:
- Multiple AI service providers: Your organization integrates with various AI service providers (Microsoft Foundry, Google Gemini, etc.)
- Frequent model/API changes: New models/APIs need to be regularly added or existing ones updated
- Dynamic routing needs: Your organization requires dynamic backend selection based on capacity, cost, or performance
When not to use this pattern: If you expect a limited number of models/API definitions with minimal ongoing changes, following the conventional approach may be simpler to implement and maintain. The additional implementation and maintenance effort required by the Unified AI Gateway pattern should be weighed against the management overhead it is intended to reduce. Refer to the next section for details on implementing the Unified AI Gateway pattern, including how the request and response pipeline is built using API Management policy fragments.
Get started
Get started by exploring a simplified sample that demonstrates the Unified AI Gateway pattern: Azure-Samples/APIM-Unified-AI-Gateway-Sample. The sample shows how to route requests to multiple AI models through a single API Management endpoint, including Phi‑4, GPT‑4.1, and GPT‑4.1‑mini from Microsoft Foundry, as well as Google Gemini 2.5 Flash‑Lite. It uses a universal wildcard API definition (/*) across GET, POST, PUT, and DELETE operations, routing all requests through a unified, policy-driven pipeline built with policy fragments to ensure consistent security, dynamic routing, load balancing, rate limiting, and comprehensive logging and monitoring.
The Unified AI Gateway pattern is designed to be extensible, allowing organizations to add support for additional API types, models, versions, etc. to meet their unique requirements through minimal updates to policy fragments. Each policy fragment is designed as a modular component with a single, well-defined responsibility. This modular design enables targeted customization, such as adding customized token tracking, without impacting the rest of the pipeline.
Unified AI Gateway sample component diagramAcknowledgments
We would like to recognize the following Uniper contributors for their design of the Unified AI Gateway pattern and their contributions to this blog post:
~ Hinesh Pankhania, Uniper – Head of Cloud Engineering and CCoE
~ Ian Beeson, Uniper - API Centre of Excellence Lead
~ Steve Atkinson – Freelance AI Architect and AI Engineering Lead (Contract)