azure api management

116 Topics

Azure app platform at Ignite 2025: New innovations for all your apps and agents
Learn more about the latest Azure app platform innovations at Microsoft Ignite 2025 that help you build, modernize, and run all your apps and agents.
NagaSurendran
Nov 18, 2025 Place Apps on Azure Blog
740Views
1like
0Comments
Reimagining AI Ops with Azure SRE Agent: New Automation, Integration, and Extensibility features
Azure SRE Agent offers intelligent and context aware automation for IT operations. Enhanced by customer feedback from our preview, the SRE Agent has evolved into an extensible platform to automate and manage tasks across Azure and other environments. Built on an Agentic DevOps approach - drawing from proven practices in internal Azure operations - the Azure SRE Agent has already saved over 20,000 engineering hours across Microsoft product teams operations, delivering strong ROI for teams seeking sustainable AIOps. An Operations Agent that adapts to your playbooks Azure SRE Agent is an AI powered operations automation platform that empowers SREs, DevOps, IT operations, and support teams to automate tasks such as incident response, customer support, and developer operations from a single, extensible agent. Its value proposition and capabilities have evolved beyond diagnosis and mitigation of Azure issues, to automating operational workflows and seamless integration with the standards and processes used in your organization. SRE Agent is designed to automate operational work and reduce toil, enabling developers and operators to focus on high-value tasks. By streamlining repetitive and complex processes, SRE Agent accelerates innovation and improves reliability across cloud and hybrid environments. In this article, we will look at what’s new and what has changed since the last update. What’s New: Automation, Integration, and Extensibility Azure SRE Agent just got a major upgrade. From no-code automation to seamless integrations and expanded data connectivity, here’s what’s new in this release: No-code Sub-Agent Builder: Rapidly create custom automations without writing code. Flexible, event-driven triggers: Instantly respond to incidents and operational changes. Expanded data connectivity: Unify diagnostics and troubleshooting across more data sources. Custom actions: Integrate with your existing tools and orchestrate end-to-end workflows via MCP. Prebuilt operational scenarios: Accelerate deployment and improve reliability out of the box. Unlike generic agent platforms, Azure SRE Agent comes with deep integrations, prebuilt tools, and frameworks specifically for IT, DevOps, and SRE workflows. This means you can automate complex operational tasks faster and more reliably, tailored to your organization’s needs. Sub-Agent Builder: Custom Automation, No Code Required Empower teams to automate repetitive operational tasks without coding expertise, dramatically reducing manual workload and development cycles. This feature helps address the need for targeted automation, letting teams solve specific operational pain points without relying on one-size-fits-all solutions. Modular Sub-Agents: Easily create custom sub-agents tailored to your team’s needs. Each sub-agent can have its own instructions, triggers, and toolsets, letting you automate everything from outage response to customer email triage. Prebuilt System Tools: Eliminate the inefficiency of creating basic automation from scratch, and choose from a rich library of hundreds of built-in tools for Azure operations, code analysis, deployment management, diagnostics, and more. Custom Logic: Align automation to your unique business processes by defining your automation logic and prompts, teaching the agent to act exactly as your workflow requires. Flexible Triggers: Automate on Your Terms Invoke the agent to respond automatically to mission-critical events, not wait for manual commands. This feature helps speed up incident response and eliminate missed opportunities for efficiency. Multi-Source Triggers: Go beyond chat-based interactions, and trigger the agent to automatically respond to Incident Management and Ticketing systems like PagerDuty and ServiceNow, Observability Alerting systems like Azure Monitor Alerts, or even on a cron-based schedule for proactive monitoring and best-practices checks. Additional trigger sources such as GitHub issues, Azure DevOps pipelines, email, etc. will be added over time. This means automation can start exactly when and where you need it. Event-Driven Operations: Integrate with your CI/CD, monitoring, or support systems to launch automations in response to real-world events - like deployments, incidents, or customer requests. Vital for reducing downtime, it ensures that business-critical actions happen automatically and promptly. Expanded Data Connectivity: Unified Observability and Troubleshooting Integrate data, enabling comprehensive diagnostics and troubleshooting and faster, more informed decision-making by eliminating silos and speeding up issue resolution. Multiple Data Sources: The agent can now read data from Azure Monitor, Log Analytics, and Application Insights based on its Azure role-based access control (RBAC). Additional observability data sources such as Dynatrace, New Relic, Datadog, and more can be added via the Remote Model Context Protocol (MCP) servers for these tools. This gives you a unified view for diagnostics and automation. Knowledge Integration: Rather than manually detailing every instruction in your prompt, you can upload your Troubleshooting Guide (TSG) or Runbook directly, allowing the agent to automatically create an execution plan from the file. You may also connect the agent to resources like SharePoint, Jira, or documentation repositories through Remote MCP servers, enabling it to retrieve needed files on its own. This approach utilizes your organization’s existing knowledge base, streamlining onboarding and enhancing consistency in managing incidents. Azure SRE Agent is also building multi-agent collaboration by integrating with PagerDuty and Neubird, enabling advanced, cross-platform incident management and reliability across diverse environments. Custom Actions: Automate Anything, Anywhere Extend automation beyond Azure and integrate with any tool or workflow, solving the problem of limited automation scope and enabling end-to-end process orchestration. Out-of-the-Box Actions: Instantly automate common tasks like running azcli, kubectl, creating GitHub issues, or updating Azure resources, reducing setup time and operational overhead. Communication Notifications: The SRE Agent now features built-in connectors for Outlook, enabling automated email notifications, and for Microsoft Teams, allowing it to post messages directly to Teams channels for streamlined communication. Bring Your Own Actions: Drop in your own Remote MCP servers to extend the agent’s capabilities to any custom tool or workflow. Future-proof your agentic DevOps by automating proprietary or emerging processes with confidence. Prebuilt Operations Scenarios Address common operational challenges out of the box, saving teams time and effort while improving reliability and customer satisfaction. Incident Response: Minimize business impact and reduce operational risk by automating detection, diagnosis, and mitigation of your workload stack. The agent has built-in runbooks for common issues related to many Azure resource types including Azure Kubernetes Service (AKS), Azure Container Apps (ACA), Azure App Service, Azure Logic Apps, Azure Database for PostgreSQL, Azure CosmosDB, Azure VMs, etc. Support for additional resource types is being added continually, please see product documentation for the latest information. Root Cause Analysis & IaC Drift Detection: Instantly pinpoint incident causes with AI-driven root cause analysis including automated source code scanning via GitHub and Azure DevOps integration. Proactively detect and resolve infrastructure drift by comparing live cloud environments against source-controlled IaC, ensuring configuration consistency and compliance. Handle Complex Investigations: Enable the deep investigation mode that uses a hypothesis-driven method to analyze possible root causes. It collects logs and metrics, tests hypotheses with iterative checks, and documents findings. The process delivers a clear summary and actionable steps to help teams accurately resolve critical issues. Incident Analysis: The integrated dashboard offers a comprehensive overview of all incidents managed by the SRE Agent. It presents essential metrics, including the number of incidents reviewed, assisted, and mitigated by the agent, as well as those awaiting human intervention. Users can leverage aggregated visualizations and AI-generated root cause analyses to gain insights into incident processing, identify trends, enhance response strategies, and detect areas for improvement in incident management. Inbuilt Agent Memory: The new SRE Agent Memory System transforms incident response by institutionalizing the expertise of top SREs - capturing, indexing, and reusing critical knowledge from past incidents, investigations, and user guidance. Benefit from faster, more accurate troubleshooting, as the agent learns from both successes and mistakes, surfacing relevant insights, runbooks, and mitigation strategies exactly when needed. This system leverages advanced retrieval techniques and a domain-aware schema to ensure every on-call engagement is smarter than the last, reducing mean time to resolution (MTTR) and minimizing repeated toil. Automatically gain a continuously improving agent that remembers what works, avoids past pitfalls, and delivers actionable guidance tailored to the environment. GitHub Copilot and Azure DevOps Integration: Automatically triage, respond to, and resolve issues raised in GitHub or Azure DevOps. Integration with modern development platforms such as GitHub Copilot coding agent increases efficiency and ensures that issues are resolved faster, reducing bottlenecks in the development lifecycle. Ready to get started? Azure SRE Agent home page Product overview Pricing Page Pricing Calculator Pricing Blog Demo recordings Deployment samples What’s Next? Give us feedback: Your feedback is critical - You can Thumbs Up / Thumbs Down each interaction or thread, or go to the “Give Feedback” button in the agent to give us in-product feedback - or you can create issues or just share your thoughts in our GitHub repo at https://github.com/microsoft/sre-agent. We’re just getting started. In the coming months, expect even more prebuilt integrations, expanded data sources, and new automation scenarios. We anticipate continuous growth and improvement throughout our agentic AI platforms and services to effectively address customer needs and preferences. Let us know what Ops toil you want to automate next!
vyomnagrani
Nov 18, 2025 Place Apps on Azure Blog
981Views
0likes
0Comments
Running Self-hosted APIM Gateways in Azure Container Apps with VNet Integration
With Azure Container Apps we can run containerized applications, completely serverless. The platform itself handles all the orchestration needed to dynamically scale based on your set triggers (such as KEDA) and even scale-to-zero! I have been working a lot with customers recently on using Azure API Management (APIM) and the topic of how we can leverage Azure APIM to manage our internal APIs without having to expose a public IP and stay within compliance from a security standpoint, which leads to the use of a Self-Hosted Gateway. This offers a managed gateway deployed within their network, allowing a unified approach in managing their APIs while keeping all API communication in-network. The self-hosted gateway is deployed as a container and in this article, we will go through how to provision a self-hosted gateway on Azure Container Apps specifically. I assume there is already an Azure APIM instance provisioned and will dive into creating and configuring the self-hosted gateway on ACA. Prerequisites As mentioned, ensure you have an existing Azure API Management instance. We will be using the Azure CLI to configure the container apps in this walkthrough. To run the commands, you need to have the Azure CLI installed on your local machine and ensure you have the necessary permissions in your Azure subscription. Retrieve Gateway Deployment Settings from APIM First, we need to get the details for our gateway from APIM. Head over to the Azure portal and navigate to your API Management instance. - In the left menu, under Deployment and infrastructure, select Gateways. - Here, you'll find the gateway resource you provisioned. Click on it and go to Deployment. - You'll need to copy the Gateway Token and Configuration endpoint values. (these tell the self-hosted gateway which APIM instance and Gateway to register under) Create a Container Apps Environment Next, we need to create a Container Apps environment. This is where we will create the container app in which our self-hosted gateway will be hosted. Using Azure CLI: Create our VNet and Subnet for our ACA Environment As we want access to our internal APIs, when we create the container apps environment, we need to have the VNet created with a subnet available. Note: If we’re using Workload Profiles (we will in this walkthrough), then we need to delegate the subnet to Microsoft.App/environments. # Create the vnet az network vnet create --resource-group rgContosoDemo \ --name vnet-contoso-demo \ --location centralUS \ --address-prefix 10.0.0.0/16 # Create the subnet az network vnet subnet create --resource-group rgContosoDemo \ --vnet-name vnet-contoso-demo \ --name infrastructure-subnet \ --address-prefixes 10.0.0.0/23 # If you are using a workload profile (we are for this walkthrough) then delegate the subnet az network vnet subnet update --resource-group rgContosoDemo \ --vnet-name vnet-contoso-demo \ --name infrastructure-subnet \ --delegations Microsoft.App/environments Create the Container App Environment in out VNet az containerapp env create --name aca-contoso-env \ --resource-group rgContosoDemo \ --location centralUS \ --enable-workload-profiles Deploy the Self-Hosted Gateway to a Container App Creating the environment takes about 10 minutes and once complete, then comes the fun part—deploying the self-hosted gateway container image to a container app. Using Azure CLI: Create the Container App: az containerapp create --name aca-apim-demo-gateway \ --resource-group rgContosoDemo \ --environment aca-contoso-env \ --workload-profile-name "Consumption" \ --image "mcr.microsoft.com/azure-api-management/gateway:2.5.0" \ --target-port 8080 \ --ingress 'external' \ ---env-vars "config.service.endpoint"="<YOUR_ENDPOINT>" "config.service.auth"="<YOUR_TOKEN>" "net.server.http.forwarded.proto.enabled"="true" Here, you'll replace <YOUR_ENDPOINT> and <YOUR_TOKEN> with the values you copied earlier. Configure Ingress for the Container App: az containerapp ingress enable --name aca-apim-demo-gateway --resource-group rgContosoDemo --type external --target-port 8080 This command ensures that your container app is accessible externally. Verify the Deployment Finally, let's make sure everything is running smoothly. Navigate to the Azure portal and go to your Container Apps environment. Select the container app you created (aca-apim-demo-gateway) and navigate to Replicas to verify that it's running. You can use the status endpoint of the self-hosted gateway to determine if your gateway is running as well: curl -i https://aca-apim-demo-gateway.sillytreats-abcd1234.centralus.azurecontainerapps.io/status-012345678990abcdef Verify Gateway Health in APIM You can navigate in the Azure Portal to APIM and verify the gateway is showing up as healthy. Navigate to Deployment and Infrastructure, select Gateways then choose your Gateway. On the Overview page you’ll see the status of your gateway deployment. And that’s it! You've successfully deployed an Azure APIM self-hosted gateway in Azure Container Apps with VNet integration allowing access to your internal APIs with easy management from the APIM portal in Azure. This setup allows you to manage your APIs efficiently while leveraging the scalability and flexibility of Azure Container Apps. If you have any questions or need further assistance, feel free to ask. How are you feeling about this setup? Does it make sense, or is there anything you'd like to dive deeper into?
huntleyh
Nov 15, 2025 Place Apps on Azure Blog
1.7KViews
3likes
3Comments
From Timeouts to Triumph: Optimizing GPT-4o-mini for Speed, Efficiency, and Reliability
The Challenge Large-scale generative AI deployments can stretch system boundaries — especially when thousands of concurrent requests require both high throughput and low latency. In one such production environment, GPT-4o-mini deployments running under Provisioned Throughput Units (PTUs) began showing sporadic 408 (timeout) and 429 (throttling) errors. Requests that normally completed in seconds were occasionally hitting the 60-second timeout window, causing degraded experiences and unnecessary retries. Initial suspicion pointed toward PTU capacity limitations, but deeper telemetry revealed a different cause. What the Data Revealed Using Azure Data Explorer (Kusto), API Management (APIM) logs, and OpenAI billing telemetry, a detailed investigation uncovered several insights: Latency was not correlated with PTU utilization: PTU resources were healthy and performing within SLA even during spikes. Time-Between-Tokens (TBT) stayed consistently low (~8–10 ms): The model was generating tokens steadily. Excessive token output was the real bottleneck: Requests generating 6K–8K tokens simply required more time than allowed in the 60-second completion window. In short — the model wasn’t slow; the workload was oversized. The Optimization Opportunity The analysis opened a broader optimization opportunity: Balance token length with throughput targets. Introduce architectural patterns to prevent timeout or throttling cascades under load. Enforce automatic token governance instead of relying on client-side discipline. The Solution Three engineering measures delivered immediate impact: token optimization, spillover routing, and policy enforcement. Right-size the Token Budget Empirical throughput for GPT-4o-mini: ~33 tokens/sec → ~2K tokens in 60s. Enforced max_tokens = 2000 for synchronous requests. Enabled streaming responses for longer outputs, allowing incremental delivery without hitting timeout limits. Enable Spillover for Continuity Implemented multi-region spillover using Azure Front Door and APIM Premium gateways. When PTU queues reached capacity or 429s appeared, requests were routed to Standard deployments in secondary regions. The result: graceful degradation and uninterrupted user experience. Govern with APIM Policies Added inbound policies to inspect and adjust max_tokens dynamically. On 408/429 responses, APIM retried and rerouted traffic based on spillover logic. The Results After optimization, improvements were immediate and measurable: Latency Reduction: Significant improvement in end-to-end response times across high-volume workloads Reliability Gains: 408/429 errors fell from >1% to near zero. Cost Efficiency: Average token generation decreased by ~60%, reducing per-request costs. Scalability: Spillover routing ensured consistent performance during regional or capacity surges. Governance: APIM policies established a reusable token-control framework for future AI workloads. Lessons Learned Latency isn’t always about capacity: Investigate workload patterns before scaling hardware. Token budgets define the user experience: Over-generation can quietly break SLA compliance. Design for elasticity: Spillover and multi-region routing maintain continuity during spikes. Measure everything: Combine KQL telemetry, latency and token tracking for faster diagnostics. The Outcome By applying data-driven analysis, architectural tuning, and automated governance, the team turned an operational bottleneck into a model of consistent, scalable performance. The result: Faster responses. Lower costs. Higher trust. A blueprint for building resilient, high-throughput AI systems on Azure.
psundars
Oct 14, 2025 Place Apps on Azure Blog
336Views
4likes
0Comments
Fixed ip address for outbound calls from Azure APIM Standard V2
Hi, I recently ran a PoC deployment of Azure APIM Standard V2 Sku instead of our current Premium Classic instance. This worked well! Performance is great and I am able to route calls to an on-prem network ok using vnet-integration. However, one of the features we currently make use of with the Premium Classic instance is a fixed ip address for calls from APIM to 3rd parties. Is there a way to achieve this using Standard V2? We have tried a nat gateway with fixed ip on the same vnet but this does not seem to help.
Solved
BizTalkers
Sep 30, 2025 Place Azure Integration Services
77Views
0likes
1Comment
Developer Tier APIM + Self-hosted Gateway
Developer tier APIM enjoys many "premium" features such as vnet injection. Feature Consumption Developer Basic Basic v2 Standard Standard v2 Premium Premium v2 (preview) Microsoft Entra integration 1 No Yes No Yes Yes Yes Yes Yes Virtual network injection support No Yes No No No No Yes Yes Private endpoint support for inbound connections No Yes Yes No Yes Yes Yes No Outbound virtual network integration support No No No No No Yes No Yes Multi-region deployment No No No No No No Yes No Availability zones No No No No No No Yes No Multiple custom domain names for gateway No Yes No No No No Yes No Developer portal 2 No Yes Yes Yes Yes Yes Yes Yes Built-in cache No Yes Yes Yes Yes Yes Yes Yes External cache Yes Yes Yes Yes Yes Yes Yes Yes Autoscaling No No Yes No Yes No Yes No API analytics No Yes Yes Yes Yes Yes Yes Yes Self-hosted gateway 3 No Yes No No No No Yes No Workspaces No No No No No No Yes Yes TLS settings Yes Yes Yes Yes Yes Yes Yes Yes Client certificate authentication Yes Yes Yes Yes Yes Yes Yes Yes Policies 4 Yes Yes Yes Yes Yes Yes Yes Yes Credential manager Yes Yes Yes Yes Yes Yes Yes Yes Backup and restore No Yes Yes No Yes No Yes No Management over Git No Yes Yes No Yes No Yes No Azure Monitor metrics Yes Yes Yes Yes Yes Yes Yes Yes Azure Monitor and Log Analytics request logs No Yes Yes Yes Yes Yes Yes Yes Application Insights request logs Yes Yes Yes Yes Yes Yes Yes Yes Static IP No Yes Yes No Yes No Yes No Export API to Power Platform Yes Yes Yes Yes Yes Yes Yes Yes Export API to Postman Yes Yes Yes Yes Yes Yes Yes Yes Export API to MCP server (preview) No No Yes Yes Yes Yes Yes Yes Expose existing MCP server (preview) No No Yes Yes Yes Yes Yes Yes Feature-based comparison of Azure API Management tiers | Microsoft Learn It however comes with weaknesses like no SLA guaranteed since it's designed for non-production use cases and evaluations in the first place. For projects that benefit from the ROI of developer tier APIM and would like to have more control over APIM service availability, developer tier + self-hosted gateway might be an option. One example use case here is to provision an Azure VM and then setup self-hosted gateway on the VM. This way users are able to manage and maintain the underlying VM and avoid events like VM Guest OS upgrade during business hours that would have caused service disruptions on developer tier APIM. Just a quick thought...
reve
Sep 27, 2025 Place Azure PaaS Blog
188Views
0likes
0Comments
Azure User Expresses Concern
A customer opened ticket SR#2407190040010082 as their consumption sku APIM service was stuck updating: Now that the service has exited that "updating" status I am able to resume working with it. The concern I want to share with you is my concern with how the system responds to a certificate error and gets stuck in that "updating" state. We know that network and login activities can fail on occasion. When APIM responds by getting stuck in that state it cannot be updated and it cannot be deleted and recreated. This issue lasted for a day before APIM eventually emerged from that state for reasons I am unaware. I was powerless and had to keep going back to check. Yes, this case is resolved but I hope that this feedback can be shared with the team in the hopes that a fix or enhancement to better handle this situation can be implemented.
v-jafigueroa
Jul 10, 2025 Place Azure PaaS
352Views
5likes
1Comment
Calculating Chargebacks for Business Units/Projects Utilizing a Shared Azure OpenAI Instance
Azure OpenAI Service is at the forefront of technological innovation, offering REST API access to OpenAI's suite of revolutionary language models, including GPT-4, GPT-35-Turbo, and the Embeddings model series. Enhancing Throughput for Scale As enterprises seek to deploy OpenAI's powerful language models across various business units, they often require granular control over configuration and performance metrics. To address this need, Azure OpenAI Service is introducing dedicated throughput, a feature that provides a dedicated connection to OpenAI models with guaranteed performance levels. Throughput is quantified in terms of tokens per second (tokens/sec), allowing organizations to precisely measure and optimize the performance for both prompts and completions. The model of provisioned throughput provides enhanced management and adaptability for varying workloads, guaranteeing system readiness for spikes in demand. This capability also ensures a uniform user experience and steady performance for applications that require real-time responses. Resource Sharing and Chargeback Mechanisms Large organizations frequently provision a singular instance of Azure OpenAI Service that is shared across multiple internal departments. This shared use necessitates an efficient mechanism for allocating costs to each business unit or consumer, based on the number of tokens consumed. This article delves into how chargeback is calculated for each business unit based on their token usage. Leveraging Azure API Management Policies for Token Tracking Azure API Management Policies offer a powerful solution for monitoring and logging the token consumption for each internal application. The process can be summarized in the following steps: ** Sample Code: Refer to this GitHub repository to get a step-by-step instruction on how to build the solution outlined below : private-openai-with-apim-for-chargeback 1. Client Applications Authorizes to API Management To make sure only legitimate clients can call the Azure OpenAI APIs, each client must first authenticate against Azure Active Directory and call APIM endpoint. In this scenario, the API Management service acts on behalf of the backend API, and the calling application requests access to the API Management instance. The scope of the access token is between the calling application and the API Management gateway. In API Management, configure a policy (validate-jwt or validate-azure-ad-token) to validate the token before the gateway passes the request to the backend. 2. APIM redirects the request to OpenAI service via private endpoint. Upon successful verification of the token, Azure API Management (APIM) routes the request to Azure OpenAI service to fetch response for completions endpoint, which also includes prompt and completion token counts. 3. Capture and log API response to Event Hub Leveraging the log-to-eventhub policy to capture outgoing responses for logging or analytics purposes. To use this policy, a logger needs to be configured in the API Management: # API Management service-specific details $apimServiceName = "apim-hello-world" $resourceGroupName = "myResourceGroup" # Create logger $context = New-AzApiManagementContext -ResourceGroupName $resourceGroupName -ServiceName $apimServiceName New-AzApiManagementLogger -Context $context -LoggerId "OpenAiChargeBackLogger" -Name "ApimEventHub" -ConnectionString "Endpoint=sb://<EventHubsNamespace>.servicebus.windows.net/;SharedAccessKeyName=<KeyName>;SharedAccessKey=<key>" -Description "Event hub logger with connection string" Within outbound policies section, pull specific data from the body of the response and send this information to the previously configured EventHub instance. This is not just a simple logging exercise; it is an entry point into a whole ecosystem of real-time analytics and monitoring capabilities: <outbound> <choose> <when condition="@(context.Response.StatusCode == 200)"> <log-to-eventhub logger-id="TokenUsageLogger">@{ var responseBody = context.Response.Body?.As<JObject>(true); return new JObject( new JProperty("Timestamp", DateTime.UtcNow.ToString()), new JProperty("ApiOperation", responseBody["object"].ToString()), new JProperty("AppKey", context.Request.Headers.GetValueOrDefault("Ocp-Apim-Subscription-Key",string.Empty)), new JProperty("PromptTokens", responseBody["usage"]["prompt_tokens"].ToString()), new JProperty("CompletionTokens", responseBody["usage"]["completion_tokens"].ToString()), new JProperty("TotalTokens", responseBody["usage"]["total_tokens"].ToString()) ).ToString(); }</log-to-eventhub> </when> </choose> <base /> </outbound> EventHub serves as a powerful fulcrum, offering seamless integration with a wide array of Azure and Microsoft services. For example, the logged data can be directly streamed to Azure Stream Analytics for real-time analytics or to Power BI for real-time dashboards With Azure Event Grid, the same data can also be used to trigger workflows or automate tasks based on specific conditions met in the incoming responses. Moreover, the architecture is extensible to non-Microsoft services as well. Event Hubs can interact smoothly with external platforms like Apache Spark, allowing you to perform data transformations or feed machine learning models. 4: Data Processing with Azure Functions An Azure Function is invoked when data is sent to the EventHub instance, allowing for bespoke data processing in line with your organization’s unique requirements. For instance, this could range from dispatching the data to Azure Monitor, streaming it to Power BI dashboards, or even sending detailed consumption reports via Azure Communication Service. [Function("TokenUsageFunction")] public async Task Run([EventHubTrigger("%EventHubName%", Connection = "EventHubConnection")] string[] openAiTokenResponse) { //Eventhub Messages arrive as an array foreach (var tokenData in openAiTokenResponse) { try { _logger.LogInformation($"Azure OpenAI Tokens Data Received: {tokenData}"); var OpenAiToken = JsonSerializer.Deserialize<OpenAiToken>(tokenData); if (OpenAiToken == null) { _logger.LogError($"Invalid OpenAi Api Token Response Received. Skipping."); continue; } _telemetryClient.TrackEvent("Azure OpenAI Tokens", OpenAiToken.ToDictionary()); } catch (Exception e) { _logger.LogError($"Error occured when processing TokenData: {tokenData}", e.Message); } } } In the example above, Azure function processes the tokens response data in Event Hub and sends them to Application Insights telemetry, and a basic Dashboard is configured in Azure, displaying the token consumption for each client application. This information can conveniently be used to compute chargeback costs. A sample query used in dashboard above that fetches tokens consumed by a specific client: customEvents | where name contains "Azure OpenAI Tokens" | extend tokenData = parse_json(customDimensions) | where tokenData.AppKey contains "your-client-key" | project Timestamp = tokenData.Timestamp, Stream = tokenData.Stream, ApiOperation = tokenData.ApiOperation, PromptTokens = tokenData.PromptTokens, CompletionTokens = tokenData.CompletionTokens, TotalTokens = tokenData.TotalTokens Azure OpenAI Landing Zone reference architecture A crucial detail to ensure the effectiveness of this approach is to secure the Azure OpenAI service by implementing Private Endpoints and using Managed Identities for App Service to authorize access to Azure AI services. This will limit access so that only the App Service can communicate with the Azure OpenAI service. Failing to do this would render the solution ineffective, as individuals could bypass the APIM/App Service and directly access the OpenAI Service if they get hold of the access key for OpenAI. Refer to Azure OpenAI Landing Zone reference architecture to build a secure and scalable AI environment. Additional Considerations If the client application is external, consider using an Application Gateway in front of the Azure APIM If "streaming" is set to true, tokens count is not returned in response. In that that case libraries like tiktoken (Python), orgpt-3-encoder(javascript) for most GPT-3 models can be used to programmatically calculate tokens count for the user prompt and completion response. A useful guideline to remember is that in typical English text, one token is approximately equal to around 4 characters. This equates to about three-quarters of a word, meaning that 100 tokens are roughly equivalent to 75 words. (P.S. Microsoft does not endorse or guarantee any third-party libraries.) A subscription key or a custom header like app-key can also be used to uniquely identify the client as appId in OAuth token is not very intuitive. Rate-limiting can be implemented for incoming requests using OAuth tokens or Subscription Keys, adding another layer of security and resource management. The solution can also be extended to redirect different clients to different Azure OpenAI instances. For example., some clients utilize an Azure OpenAI instance with default quotas, whereas premium clients get to consume Azure Open AI instance with dedicated throughput. Conclusion Azure OpenAI Service stands as an indispensable tool for organizations seeking to harness the immense power of language models. With the feature of provisioned throughput, clients can define their usage limits in throughput units and freely allocate these to the OpenAI model of their choice. However, the financial commitment can be significant and is dependent on factors like the chosen model's type, size, and utilization. An effective chargeback system offers several advantages, such as heightened accountability, transparent costing, and judicious use of resources within the organization.
shikhasinha
Jun 06, 2025 Place Apps on Azure Blog
21KViews
10likes
10Comments
Troubleshoot 500 BackendConnectionFailure SSL/TLS Error
This article introduces how to troubleshoot the 500 BackendConnectionFailure error with the error message 'The underlying connection was closed: Could not establish trust relationship for the SSL/TLS secure channel.'
jiayiwu
Jun 04, 2025 Place Azure PaaS Blog
21KViews
5likes
1Comment
Reimagining App Modernization for the Era of AI
This blog highlights the key announcements and innovations from Microsoft Build 2025. It focuses on how AI is transforming the software development lifecycle, particularly in app modernization. Key topics include the use of GitHub Copilot for accelerating development and modernization, the introduction of Azure SRE agent for managing production systems, and the launch of the App Modernization Guidance to help organizations modernize their applications with AI-first design. The blog emphasizes the strategic approach to modernization, aiming to reduce complexity, improve agility, and deliver measurable business outcomes
Ikennaokeke
May 19, 2025 Place Apps on Azure Blog
4.4KViews
2likes
0Comments