enterprise integration
82 TopicsIntroducing GenAI Gateway Capabilities in Azure API Management
We are thrilled to announce GenAI Gateway capabilities in Azure API Management – a set of features designed specifically for GenAI use cases. Azure OpenAI service offers a diverse set of tools, providing access to advanced models like GPT3.5-Turbo to GPT-4 and GPT-4 Vision, enabling developers to build intelligent applications that can understand, interpret, and generate human-like text and images. One of the main resources you have in Azure OpenAI is tokens. Azure OpenAI assigns quota for your model deployments expressed in tokens-per-minute (TPMs) which is then distributed across your model consumers that can be represented by different applications, developer teams, departments within the company, etc. Starting with a single application integration, Azure makes it easy to connect your app to Azure OpenAI. Your intelligent application connects to Azure OpenAI directly using API Key with a TPM limit configured directly on the model deployment level. However, when you start growing your application portfolio, you are presented with multiple apps calling single or even multiple Azure OpenAI endpoints deployed as Pay-as-you-go or Provisioned Throughput Units (PTUs) instances. That comes with certain challenges: How can we track token usage across multiple applications? How can we do cross charges for multiple applications/teams that use Azure OpenAI models? How can we make sure that a single app does not consume the whole TPM quota, leaving other apps with no option to use Azure OpenAI models? How can we make sure that the API key is securely distributed across multiple applications? How can we distribute load across multiple Azure OpenAI endpoints? How can we make sure that PTUs are used first before falling back to Pay-as-you-go instances? To tackle these operational and scalability challenges, Azure API Management has built a set of GenAI Gateway capabilities: Azure OpenAI Token Limit Policy Azure OpenAI Emit Token Metric Policy Load Balancer and Circuit Breaker Import Azure OpenAI as an API Azure OpenAI Semantic Caching Policy (in public preview) Azure OpenAI Token Limit Policy Azure OpenAI Token Limit policy allows you to manage and enforce limits per API consumer based on the usage of Azure OpenAI tokens. With this policy you can set limits, expressed in tokens-per-minute (TPM). This policy provides flexibility to assign token-based limits on any counter key, such as Subscription Key, IP Address or any other arbitrary key defined through policy expression. Azure OpenAI Token Limit policy also enables pre-calculation of prompt tokens on the Azure API Management side, minimizing unnecessary request to the Azure OpenAI backend if the prompt already exceeds the limit. Learn more about this policy here. Azure OpenAI Emit Token Metric Policy Azure OpenAI enables you to configure token usage metrics to be sent to Azure Applications Insights, providing overview of the utilization of Azure OpenAI models across multiple applications or API consumers. This policy captures prompt, completions, and total token usage metrics and sends them to Application Insights namespace of your choice. Moreover, you can configure or select from pre-defined dimensions to split token usage metrics, enabling granular analysis by Subscription ID, IP Address, or any custom dimension of your choice. Learn more about this policy here. Load Balancer and Circuit Breaker Load Balancer and Circuit Breaker features allow you to spread the load across multiple Azure OpenAI endpoints. With support for round-robin, weighted (new), and priority-based (new) load balancing, you can now define your own load distribution strategy according to your specific requirements. Define priorities within the load balancer configuration to ensure optimal utilization of specific Azure OpenAI endpoints, particularly those purchased as PTUs. In the event of any disruption, a circuit breaker mechanism kicks in, seamlessly transitioning to lower-priority instances based on predefined rules. Our updated circuit breaker now features dynamic trip duration, leveraging values from the retry-after header provided by the backend. This ensures precise and timely recovery of the backends, maximizing the utilization of your priority backends to their fullest. Learn more about load balancer and circuit breaker here. Import Azure OpenAI as an API New Import Azure OpenAI as an API in Azure API management provides an easy single click experience to import your existing Azure OpenAI endpoints as APIs. We streamline the onboarding process by automatically importing the OpenAPI schema for Azure OpenAI and setting up authentication to the Azure OpenAI endpoint using managed identity, removing the need for manual configuration. Additionally, within the same user-friendly experience, you can pre-configure Azure OpenAI policies, such as token limit and emit token metric, enabling swift and convenient setup. Learn more about Import Azure OpenAI as an API here. Azure OpenAI Semantic Caching policy Azure OpenAI Semantic Caching policy empowers you to optimize token usage by leveraging semantic caching, which stores completions for prompts with similar meaning. Our semantic caching mechanism leverages Azure Redis Enterprise or any other external cache compatible with RediSearch and onboarded to Azure API Management. By leveraging the Azure OpenAI Embeddings model, this policy identifies semantically similar prompts and stores their respective completions in the cache. This approach ensures completions reuse, resulting in reduced token consumption and improved response performance. Learn more about semantic caching policy here. Get Started with GenAI Gateway Capabilities in Azure API Management We’re excited to introduce these GenAI Gateway capabilities in Azure API Management, designed to empower developers to efficiently manage and scale their applications leveraging Azure OpenAI services. Get started today and bring your intelligent application development to the next level with Azure API Management.36KViews10likes14CommentsMicrosoft named a Leader in 2023 Gartner® Magic Quadrant™ for API Management
We're thrilled to announce that Gartner has once again recognized Microsoft as a Leader in the 2023 Magic Quadrant for API Management, marking the fourth consecutive year of this recognition. We believe our continued recognition as a Leader is a testament to our deep customer engagements.Azure API Management Your Auth Gateway For MCP Servers
The Model Context Protocol (MCP) is quickly becoming the standard for integrating Tools 🛠️ with Agents 🤖 and Azure API Management is at the fore-front, ready to support this open-source protocol 🚀. You may have already encountered discussions about MCP, so let's clarify some key concepts: Model Context Protocol (MCP) is a standardized way, (a protocol), for AI models to interact with external tools, (and either read data or perform actions) and to enrich context for ANY language models. AI Agents/Assistants are autonomous LLM-powered applications with the ability to use tools to connect to external services required to accomplish tasks on behalf of users. Tools are components made available to Agents allowing them to interact with external systems, perform computation, and take actions to achieve specific goals. Azure API Management: As a platform-as-a-service, API Management supports the complete API lifecycle, enabling organizations to create, publish, secure, and analyze APIs with built-in governance, security, analytics, and scalability. New Cool Kid in Town - MCP AI Agents are becoming widely adopted due to enhanced Large Language Model (LLM) capabilities. However, even the most advanced models face limitations due to their isolation from external data. Each new data source requires custom implementations to extract, prepare, and make data accessible for any model(s). - A lot of heavy lifting. Anthropic developed an open-source standard - the Model Context Protocol (MCP), to connect your agents to external data sources such as local data sources (databases or computer files) or remote services (systems available over the internet through e.g. APIs). MCP Hosts: LLM applications such as chat apps or AI assistant in your IDEs (like GitHub Copilot in VS Code) that need to access external capabilities MCP Clients: Protocol clients that maintain 1:1 connections with servers, inside the host application MCP Servers: Lightweight programs that each expose specific capabilities and provide context, tools, and prompts to clients MCP Protocol: Transport layer in the middle At its core, MCP follows a client-server architecture where a host application can connect to multiple servers. Whenever your MCP host or client needs a tool, it is going to connect to the MCP server. The MCP server will then connect to for example a database or an API. MCP hosts and servers will connect with each other through the MCP protocol. You can create your own custom MCP Servers that connect to your or organizational data sources. For a quick start, please visit our GitHub repository to learn how to build a remote MCP server using Azure Functions without authentication: https://aka.ms/mcp-remote Remote vs. Local MCP Servers The MCP standard supports two modes of operation: Remote MCP servers: MCP clients connect to MCP servers over the Internet, establishing a connection using HTTP and Server-Sent Events (SSE), and authorizing the MCP client access to resources on the user's account using OAuth. Local MCP servers: MCP clients connect to MCP servers on the same machine, using stdio as a local transport method. Azure API Management as the AI Auth Gateway Now that we have learned that MCP servers can connect to remote services through an API. The question now rises, how can we expose our remote MCP servers in a secure and scalable way? This is where Azure API Management comes in. A way that we can securely and safely expose tools as MCP servers. Azure API Management provides: Security: AI agents often need to access sensitive data. API Management as a remote MCP proxy safeguards organizational data through authentication and authorization. Scalability: As the number of LLM interactions and external tool integrations grows, API Management ensures the system can handle the load. Security remains to be a critical piece of building MCP servers, as agents will need to securely connect to protected endpoints (tools) to perform certain actions or read protected data. When building remote MCP servers, you need a way to allow users to login (Authenticate) and allow them to grant the MCP client access to resources on their account (Authorization). MCP - Current Authorization Challenges State: 4/10/2025 Recent changes in MCP authorization have sparked significant debate within the community. 🔍 𝗞𝗲𝘆 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 with the Authorization Changes: The MCP server is now treated as both a resource server AND an authorization server. This dual role has fundamental implications for MCP server developers and runtime operations. 💡 𝗢𝘂𝗿 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻: To address these challenges, we recommend using 𝗔𝘇𝘂𝗿𝗲 𝗔𝗣𝗜 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁 as your authorization gateway for remote MCP servers. 🔗For an enterprise-ready solution, please check out our azd up sample repo to learn how to build a remote MCP server using Azure API Management as your authentication gateway: https://aka.ms/mcp-remote-apim-auth The Authorization Flow The workflow involves three core components: the MCP client, the APIM Gateway, and the MCP server, with Microsoft Entra managing authentication (AuthN) and authorization (AuthZ). Using the OAuth protocol, the client starts by calling the APIM Gateway, which redirects the user to Entra for login and consent. Once authenticated, Entra provides an access token to the Gateway, which then exchanges a code with the client to generate an MCP server token. This token allows the client to communicate securely with the server via the Gateway, ensuring user validation and scope verification. Finally, the MCP server establishes a session key for ongoing communication through a dedicated message endpoint. Diagram source: https://aka.ms/mcp-remote-apim-auth-diagram Conclusion Azure API Management (APIM) is an essential tool for enterprise customers looking to integrate AI models with external tools using the Model Context Protocol (MCP). In this blog, we've emphasized the simplicity of connecting AI agents to various data sources through MCP, streamlining previously complex implementations. Given the critical role of secure access to platforms and services for AI agents, APIM offers robust solutions for managing OAuth tokens and ensuring secure access to protected endpoints, making it an invaluable asset for enterprises, despite the challenges of authentication. API Management: An Enterprise Solution for Securing MCP Servers Azure API Management is an essential tool for enterprise customers looking to integrate AI models with external tools using the Model Context Protocol (MCP). It is designed to help you to securely expose your remote MCP servers. MCP servers are still very new, and as the technology evolves, API Management provides an enterprise-ready solution that will evolve with the latest technology. Stay tuned for further feature announcements soon! Acknowledgments This post and work was made possible thanks to the hard work and dedication of our incredible team. Special thanks to Pranami Jhawar, Julia Kasper, Julia Muiruri, Annaji Sharma Ganti Jack Pa, Chaoyi Yuan and Alex Vieira for their invaluable contributions. Additional Resources MCP Client Server integration with APIM as AI gateway Blog Post: https://aka.ms/remote-mcp-apim-auth-blog Sequence Diagram: https://aka.ms/mcp-remote-apim-auth-diagram APIM lab: https://aka.ms/ai-gateway-lab-mcp-client-auth Python: https://aka.ms/mcp-remote-apim-auth .NET: https://aka.ms/mcp-remote-apim-auth-dotnet On-Behalf-Of Authorization: https://aka.ms/mcp-obo-sample 3rd Party APIs – Backend Auth via Credential Manager: Blog Post: https://aka.ms/remote-mcp-apim-lab-blog APIM lab: https://aka.ms/ai-gateway-lab-mcp YouTube Video: https://aka.ms/ai-gateway-lab-demo18KViews11likes3CommentsScaling Logic Apps Standard – Sustained Event Processing System
In the previous blog of this blog post series, we discussed how Logic App standard can be used to process high throughput batched data. In this blog, we showcase an application crafted with Azure integration services components, aiming to streamline the event processing. The events are sent to an event hub at a sustained rate, the logic app workflow processes each event by initiating a series of actions. These actions encompass the processing of the relevant event information using variables, conditions, scopes, and Azure functions for processing them, followed by sending the status of the processed event to another event hub. We will discuss two scenarios, one for the high throughput and the other one to handle throttling limitations, to protect the downstream services. High throughput event processing Logic App An influx of 4 million events is ingested through this process in a span of 8 hours. The workflow uses the event hubs built-in trigger, so the events are promptly picked up and are processed in the run at par with ingress rate. Each workflow run orchestrates several data transformation steps and collaborates with other services to enrich each event payload with additional data and then finally sends it to another event hub. Event processing Logic App with throttling limits on downstream services Now let’s handle the scenario where downstream services have throttling limits. We will use a parent workflow which reads events from the event hub and randomly sends the events to a child workflow which has 100 concurrency set on it for processing. During this test 10K events per minute were pushed to the event hub for 3-4 hours and it took close to 5 hours to complete processing cause of the slowdown from concurrency. Tests setup High throughput event processing Logic App Event processing Logic App with throttling on downstream services Number of workflows 1 11 (1 parent, 10 child workflows) Triggers Event hub Event hub Actions Event hub, variables(3), Functions (3), compose(2), conditions(6), scopes(8) Event hub, variables(3), Functions (3), compose(2), conditions(6), scopes(8) Number of storage accounts 4 4 Prewarmed instances 20 20 WS Plan WS3 WS3 Max Scale settings 100 100 Host settings used: Event hubs extension event batch size = 100 "eventHubs": { "maxEventBatchSize": 100 } "eventHubs": { "maxEventBatchSize": 100 } Exponential retry on nested workflow call N/A "retryPolicy": { "count": 4, "interval": "PT20S", "maximumInterval": "PT1H", "minimumInterval": "PT10S", "type": "exponential" } Concurrency N/A 100(on HTTP trigger in child workflow) Performance characteristics High throughput event processing Logic App Events processing trigger Rate Receiving over 10K/min events at a sustained rate for 8 hours. Action Execution Rate Consistent action execution rate of 250K actions/min for the 8 hours load run. Job Execution Rate Execution rate at the sustained average rate of 400k/min for whole of the run and maximum going to 600K/min. Execution delay and instance scaling The app had a prewarmed instance count of 20 and took about 40mins to scale out to 40 instances and during the whole run it used an average of 45 instances after the ramp-up. The 95th percentile execution delay stayed below 170ms for the most part but increased to maximum 4s during the compute ramp-up. This is expected since more jobs got queued up during the ramp-up period than were dequeued for processing. Execution Delay Instance count Event processing Logic App with throttling limits on downstream services Action Execution Rate The action execution rate of 200-250K actions/min for the 4 hours load run and then draining the queue runs for close to an hour after the test was stopped. The action execution rate per workflow is 15K-20K actions/min. Runs Execution Rate The run execution rate of 12K-15K actions/min for the 4 hours load run and then draining the queue runs for close to an hour after the test was stopped. The runs execution per workflow is 700-800 runs per minute. Job Execution Rate Execution rate of close 400k/min for whole of the run. Execution delay and instance scaling The app had a prewarmed instance count of 20 and took a maximum of 32 instances after the ramp-up. The 95th percentile execution delay stayed below 300ms for the most part but increased to a maximum 600ms during the compute ramp-up. This is expected since more jobs got queued up during the ramp-up period than were dequeued for processing. Execution Delay Instance count Results Summary This case study shows patterns for processing sustained high throughput load and controlling the throttling limits for the downstream services. 10K events per minute are processed by a single workflow and the processing is at par with the ingress rate. When concurrency knobs are used to slow down to protect the downstream services, there is a delay in processing of events. 4 hours run takes close to 5 hours to complete. High throughput events processing Logic App Logic App with throttling limits on the downstream services Total number of events processed 4,800,000 1,538,200 Total processing time 8 hours 4 hours Triggers 10k/min sustained events read 10k/min sustained events read Actions 250K actions/min sustained rate. Total actions executed: 115.2M 150K actions/minute sustained rate Total actions executed: 57M Jobs 600K/min job rate at peak 400K/min job rate sustain 490K/min job rate at peak 400K/min job rate sustain Execution delay 95th percentile increased up to 4s during scale-out and came back below 170ms at sustained load 95th percentile increased up to 600ms during scale-out and came back below 300ms for most part of the run Scale out Scaling instance count from 20 to 40 took about 40mins with maximum 52 instances. 32 maximum instances used over the course of the run References: Refer blog for using multiple storage accounts in the app.9.6KViews2likes0CommentsAnnouncing General Availability of Workspaces in Azure API Management
We are excited to announce the general availability of workspaces in Azure API Management! Workspaces enable organizations to manage APIs more productively, securely, and reliably using a federated approach.8.5KViews5likes3CommentsExpose REST APIs as MCP servers with Azure API Management and API Center (now in preview)
As AI-powered agents and large language models (LLMs) become central to modern application experiences, developers and enterprises need seamless, secure ways to connect these models to real-world data and capabilities. Today, we’re excited to introduce two powerful preview capabilities in the Azure API Management Platform: Expose REST APIs in Azure API Management as remote Model Context Protocol (MCP) servers Discover and manage MCP servers using API Center as a centralized enterprise registry Together, these updates help customers securely operationalize APIs for AI workloads and improve how APIs are managed and shared across organizations. Unlocking the value of AI through secure API integration While LLMs are incredibly capable, they are stateless and isolated unless connected to external tools and systems. Model Context Protocol (MCP) is an open standard designed to bridge this gap by allowing agents to invoke tools—such as APIs—via a standardized, JSON-RPC-based interface. With this release, Azure empowers you to operationalize your APIs for AI integration—securely, observably, and at scale. 1. Expose REST APIs as MCP servers with Azure API Management An MCP server exposes selected API operations to AI clients over JSON-RPC via HTTP or Server-Sent Events (SSE). These operations, referred to as “tools,” can be invoked by AI agents through natural language prompts. With this new capability, you can expose your existing REST APIs in Azure API Management as MCP servers—without rebuilding or rehosting them. Addressing common challenges Before this capability, customers faced several challenges when implementing MCP support: Duplicating development efforts: Building MCP servers from scratch often led to unnecessary work when existing REST APIs already provided much of the needed functionality. Security concerns: Server trust: Malicious servers could impersonate trusted ones. Credential management: Self-hosted MCP implementations often had to manage sensitive credentials like OAuth tokens. Registry and discovery: Without a centralized registry, discovering and managing MCP tools was manual and fragmented, making it hard to scale securely across teams. API Management now addresses these concerns by serving as a managed, policy-enforced hosting surface for MCP tools—offering centralized control, observability, and security. Benefits of using Azure API Management with MCP By exposing MCP servers through Azure API Management, customers gain: Centralized governance for API access, authentication, and usage policies Secure connectivity using OAuth 2.0 and subscription keys Granular control over which API operations are exposed to AI agents as tools Built-in observability through APIM’s monitoring and diagnostics features How it works MCP servers: In your API Management instance navigate to MCP servers Choose an API: + Create a new MCP Server and select the REST API you wish to expose. Configure the MCP Server: Select the API operations you want to expose as tools. These can be all or a subset of your API’s methods. Test and Integrate: Use tools like MCP Inspector or Visual Studio Code (in agent mode) to connect, test, and invoke the tools from your AI host. Getting started and availability This feature is now in public preview and being gradually rolled out to early access customers. To use the MCP server capability in Azure API Management: Prerequisites Your APIM instance must be on a SKUv1 tier: Premium, Standard, or Basic Your service must be enrolled in the AI Gateway early update group (activation may take up to 2 hours) Use the Azure Portal with feature flag: ➤ Append ?Microsoft_Azure_ApiManagement=mcp to your portal URL to access the MCP server configuration experience Note: Support for SKUv2 and broader availability will follow in upcoming updates. Full setup instructions and test guidance can be found via aka.ms/apimdocs/exportmcp. 2. Centralized MCP registry and discovery with Azure API Center As enterprises adopt MCP servers at scale, the need for a centralized, governed registry becomes critical. Azure API Center now provides this capability—serving as a single, enterprise-grade system of record for managing MCP endpoints. With API Center, teams can: Maintain a comprehensive inventory of MCP servers. Track version history, ownership, and metadata. Enforce governance policies across environments. Simplify compliance and reduce operational overhead. API Center also addresses enterprise-grade security by allowing administrators to define who can discover, access, and consume specific MCP servers—ensuring only authorized users can interact with sensitive tools. To support developer adoption, API Center includes: Semantic search and a modern discovery UI. Easy filtering based on capabilities, metadata, and usage context. Tight integration with Copilot Studio and GitHub Copilot, enabling developers to use MCP tools directly within their coding workflows. These capabilities reduce duplication, streamline workflows, and help teams securely scale MCP usage across the organization. Getting started This feature is now in preview and accessible to customers: https://aka.ms/apicenter/docs/mcp AI Gateway Lab | MCP Registry 3. What’s next These new previews are just the beginning. We're already working on: Azure API Management (APIM) Passthrough MCP server support We’re enabling APIM to act as a transparent proxy between your APIs and AI agents—no custom server logic needed. This will simplify onboarding and reduce operational overhead. Azure API Center (APIC) Deeper integration with Copilot Studio and VS Code Today, developers must perform manual steps to surface API Center data in Copilot workflows. We’re working to make this experience more visual and seamless, allowing developers to discover and consume MCP servers directly from familiar tools like VS Code and Copilot Studio. For questions or feedback, reach out to your Microsoft account team or visit: Azure API Management documentation Azure API Center documentation — The Azure API Management & API Center Teams7.5KViews5likes7CommentsScaling Logic App Standard – High Throughput Batched Data Processing System
In the previous instance of this blog post series, we discussed how Logic App standard can be configured to scale to run high throughput workloads. In this blog, we showcase an application developed with Azure integration services components, aiming to streamline the processing of large number of orders received as a batch that needs to be processed within certain predetermined amount of time.7.4KViews0likes0CommentsAnnouncing Public Preview of API Management WordPress plugin to build customized developer portals
Azure API Management WordPress plugin enables our customers to leverage the power of WordPress to build their own unique developer portal. API managers and administrators can bring up a new developer portal in a matter of few minutes and customize the theme, layout, add stylesheet or localize the portal into different languages.