azure api management
38 TopicsFrom Timeouts to Triumph: Optimizing GPT-4o-mini for Speed, Efficiency, and Reliability
The Challenge Large-scale generative AI deployments can stretch system boundaries — especially when thousands of concurrent requests require both high throughput and low latency. In one such production environment, GPT-4o-mini deployments running under Provisioned Throughput Units (PTUs) began showing sporadic 408 (timeout) and 429 (throttling) errors. Requests that normally completed in seconds were occasionally hitting the 60-second timeout window, causing degraded experiences and unnecessary retries. Initial suspicion pointed toward PTU capacity limitations, but deeper telemetry revealed a different cause. What the Data Revealed Using Azure Data Explorer (Kusto), API Management (APIM) logs, and OpenAI billing telemetry, a detailed investigation uncovered several insights: Latency was not correlated with PTU utilization: PTU resources were healthy and performing within SLA even during spikes. Time-Between-Tokens (TBT) stayed consistently low (~8–10 ms): The model was generating tokens steadily. Excessive token output was the real bottleneck: Requests generating 6K–8K tokens simply required more time than allowed in the 60-second completion window. In short — the model wasn’t slow; the workload was oversized. The Optimization Opportunity The analysis opened a broader optimization opportunity: Balance token length with throughput targets. Introduce architectural patterns to prevent timeout or throttling cascades under load. Enforce automatic token governance instead of relying on client-side discipline. The Solution Three engineering measures delivered immediate impact: token optimization, spillover routing, and policy enforcement. Right-size the Token Budget Empirical throughput for GPT-4o-mini: ~33 tokens/sec → ~2K tokens in 60s. Enforced max_tokens = 2000 for synchronous requests. Enabled streaming responses for longer outputs, allowing incremental delivery without hitting timeout limits. Enable Spillover for Continuity Implemented multi-region spillover using Azure Front Door and APIM Premium gateways. When PTU queues reached capacity or 429s appeared, requests were routed to Standard deployments in secondary regions. The result: graceful degradation and uninterrupted user experience. Govern with APIM Policies Added inbound policies to inspect and adjust max_tokens dynamically. On 408/429 responses, APIM retried and rerouted traffic based on spillover logic. The Results After optimization, improvements were immediate and measurable: Latency Reduction: Significant improvement in end-to-end response times across high-volume workloads Reliability Gains: 408/429 errors fell from >1% to near zero. Cost Efficiency: Average token generation decreased by ~60%, reducing per-request costs. Scalability: Spillover routing ensured consistent performance during regional or capacity surges. Governance: APIM policies established a reusable token-control framework for future AI workloads. Lessons Learned Latency isn’t always about capacity: Investigate workload patterns before scaling hardware. Token budgets define the user experience: Over-generation can quietly break SLA compliance. Design for elasticity: Spillover and multi-region routing maintain continuity during spikes. Measure everything: Combine KQL telemetry, latency and token tracking for faster diagnostics. The Outcome By applying data-driven analysis, architectural tuning, and automated governance, the team turned an operational bottleneck into a model of consistent, scalable performance. The result: Faster responses. Lower costs. Higher trust. A blueprint for building resilient, high-throughput AI systems on Azure.274Views4likes0CommentsRunning Self-hosted APIM Gateways in Azure Container Apps with VNet Integration
With Azure Container Apps we can run containerized applications, completely serverless. The platform itself handles all the orchestration needed to dynamically scale based on your set triggers (such as KEDA) and even scale-to-zero! I have been working a lot with customers recently on using Azure API Management (APIM) and the topic of how we can leverage Azure APIM to manage our internal APIs without having to expose a public IP and stay within compliance from a security standpoint, which leads to the use of a Self-Hosted Gateway. This offers a managed gateway deployed within their network, allowing a unified approach in managing their APIs while keeping all API communication in-network. The self-hosted gateway is deployed as a container and in this article, we will go through how to provision a self-hosted gateway on Azure Container Apps specifically. I assume there is already an Azure APIM instance provisioned and will dive into creating and configuring the self-hosted gateway on ACA. Prerequisites As mentioned, ensure you have an existing Azure API Management instance. We will be using the Azure CLI to configure the container apps in this walkthrough. To run the commands, you need to have the Azure CLI installed on your local machine and ensure you have the necessary permissions in your Azure subscription. Retrieve Gateway Deployment Settings from APIM First, we need to get the details for our gateway from APIM. Head over to the Azure portal and navigate to your API Management instance. - In the left menu, under Deployment and infrastructure, select Gateways. - Here, you'll find the gateway resource you provisioned. Click on it and go to Deployment. - You'll need to copy the Gateway Token and Configuration endpoint values. (these tell the self-hosted gateway which APIM instance and Gateway to register under) Create a Container Apps Environment Next, we need to create a Container Apps environment. This is where we will create the container app in which our self-hosted gateway will be hosted. Using Azure CLI: Create our VNet and Subnet for our ACA Environment As we want access to our internal APIs, when we create the container apps environment, we need to have the VNet created with a subnet available. Note: If we’re using Workload Profiles (we will in this walkthrough), then we need to delegate the subnet to Microsoft.App/environments. # Create the vnet az network vnet create --resource-group rgContosoDemo \ --name vnet-contoso-demo \ --location centralUS \ --address-prefix 10.0.0.0/16 # Create the subnet az network vnet subnet create --resource-group rgContosoDemo \ --vnet-name vnet-contoso-demo \ --name infrastructure-subnet \ --address-prefixes 10.0.0.0/23 # If you are using a workload profile (we are for this walkthrough) then delegate the subnet az network vnet subnet update --resource-group rgContosoDemo \ --vnet-name vnet-contoso-demo \ --name infrastructure-subnet \ --delegations Microsoft.App/environments Create the Container App Environment in out VNet az containerapp env create --name aca-contoso-env \ --resource-group rgContosoDemo \ --location centralUS \ --enable-workload-profiles Deploy the Self-Hosted Gateway to a Container App Creating the environment takes about 10 minutes and once complete, then comes the fun part—deploying the self-hosted gateway container image to a container app. Using Azure CLI: Create the Container App: az containerapp create --name aca-apim-demo-gateway \ --resource-group rgContosoDemo \ --environment aca-contoso-env \ --workload-profile-name "Consumption" \ --image "mcr.microsoft.com/azure-api-management/gateway:2.5.0" \ --target-port 8080 \ --ingress 'external' \ ---env-vars "config.service.endpoint"="<YOUR_ENDPOINT>" "config.service.auth"="<YOUR_TOKEN>" "net.server.http.forwarded.proto.enabled"="true" Here, you'll replace <YOUR_ENDPOINT> and <YOUR_TOKEN> with the values you copied earlier. Configure Ingress for the Container App: az containerapp ingress enable --name aca-apim-demo-gateway --resource-group rgContosoDemo --type external --target-port 8080 This command ensures that your container app is accessible externally. Verify the Deployment Finally, let's make sure everything is running smoothly. Navigate to the Azure portal and go to your Container Apps environment. Select the container app you created (aca-apim-demo-gateway) and navigate to Replicas to verify that it's running. You can use the status endpoint of the self-hosted gateway to determine if your gateway is running as well: curl -i https://aca-apim-demo-gateway.sillytreats-abcd1234.centralus.azurecontainerapps.io/status-012345678990abcdef Verify Gateway Health in APIM You can navigate in the Azure Portal to APIM and verify the gateway is showing up as healthy. Navigate to Deployment and Infrastructure, select Gateways then choose your Gateway. On the Overview page you’ll see the status of your gateway deployment. And that’s it! You've successfully deployed an Azure APIM self-hosted gateway in Azure Container Apps with VNet integration allowing access to your internal APIs with easy management from the APIM portal in Azure. This setup allows you to manage your APIs efficiently while leveraging the scalability and flexibility of Azure Container Apps. If you have any questions or need further assistance, feel free to ask. How are you feeling about this setup? Does it make sense, or is there anything you'd like to dive deeper into?1.6KViews3likes2CommentsCalculating Chargebacks for Business Units/Projects Utilizing a Shared Azure OpenAI Instance
Azure OpenAI Service is at the forefront of technological innovation, offering REST API access to OpenAI's suite of revolutionary language models, including GPT-4, GPT-35-Turbo, and the Embeddings model series. Enhancing Throughput for Scale As enterprises seek to deploy OpenAI's powerful language models across various business units, they often require granular control over configuration and performance metrics. To address this need, Azure OpenAI Service is introducing dedicated throughput, a feature that provides a dedicated connection to OpenAI models with guaranteed performance levels. Throughput is quantified in terms of tokens per second (tokens/sec), allowing organizations to precisely measure and optimize the performance for both prompts and completions. The model of provisioned throughput provides enhanced management and adaptability for varying workloads, guaranteeing system readiness for spikes in demand. This capability also ensures a uniform user experience and steady performance for applications that require real-time responses. Resource Sharing and Chargeback Mechanisms Large organizations frequently provision a singular instance of Azure OpenAI Service that is shared across multiple internal departments. This shared use necessitates an efficient mechanism for allocating costs to each business unit or consumer, based on the number of tokens consumed. This article delves into how chargeback is calculated for each business unit based on their token usage. Leveraging Azure API Management Policies for Token Tracking Azure API Management Policies offer a powerful solution for monitoring and logging the token consumption for each internal application. The process can be summarized in the following steps: ** Sample Code: Refer to this GitHub repository to get a step-by-step instruction on how to build the solution outlined below : private-openai-with-apim-for-chargeback 1. Client Applications Authorizes to API Management To make sure only legitimate clients can call the Azure OpenAI APIs, each client must first authenticate against Azure Active Directory and call APIM endpoint. In this scenario, the API Management service acts on behalf of the backend API, and the calling application requests access to the API Management instance. The scope of the access token is between the calling application and the API Management gateway. In API Management, configure a policy (validate-jwt or validate-azure-ad-token) to validate the token before the gateway passes the request to the backend. 2. APIM redirects the request to OpenAI service via private endpoint. Upon successful verification of the token, Azure API Management (APIM) routes the request to Azure OpenAI service to fetch response for completions endpoint, which also includes prompt and completion token counts. 3. Capture and log API response to Event Hub Leveraging the log-to-eventhub policy to capture outgoing responses for logging or analytics purposes. To use this policy, a logger needs to be configured in the API Management: # API Management service-specific details $apimServiceName = "apim-hello-world" $resourceGroupName = "myResourceGroup" # Create logger $context = New-AzApiManagementContext -ResourceGroupName $resourceGroupName -ServiceName $apimServiceName New-AzApiManagementLogger -Context $context -LoggerId "OpenAiChargeBackLogger" -Name "ApimEventHub" -ConnectionString "Endpoint=sb://<EventHubsNamespace>.servicebus.windows.net/;SharedAccessKeyName=<KeyName>;SharedAccessKey=<key>" -Description "Event hub logger with connection string" Within outbound policies section, pull specific data from the body of the response and send this information to the previously configured EventHub instance. This is not just a simple logging exercise; it is an entry point into a whole ecosystem of real-time analytics and monitoring capabilities: <outbound> <choose> <when condition="@(context.Response.StatusCode == 200)"> <log-to-eventhub logger-id="TokenUsageLogger">@{ var responseBody = context.Response.Body?.As<JObject>(true); return new JObject( new JProperty("Timestamp", DateTime.UtcNow.ToString()), new JProperty("ApiOperation", responseBody["object"].ToString()), new JProperty("AppKey", context.Request.Headers.GetValueOrDefault("Ocp-Apim-Subscription-Key",string.Empty)), new JProperty("PromptTokens", responseBody["usage"]["prompt_tokens"].ToString()), new JProperty("CompletionTokens", responseBody["usage"]["completion_tokens"].ToString()), new JProperty("TotalTokens", responseBody["usage"]["total_tokens"].ToString()) ).ToString(); }</log-to-eventhub> </when> </choose> <base /> </outbound> EventHub serves as a powerful fulcrum, offering seamless integration with a wide array of Azure and Microsoft services. For example, the logged data can be directly streamed to Azure Stream Analytics for real-time analytics or to Power BI for real-time dashboards With Azure Event Grid, the same data can also be used to trigger workflows or automate tasks based on specific conditions met in the incoming responses. Moreover, the architecture is extensible to non-Microsoft services as well. Event Hubs can interact smoothly with external platforms like Apache Spark, allowing you to perform data transformations or feed machine learning models. 4: Data Processing with Azure Functions An Azure Function is invoked when data is sent to the EventHub instance, allowing for bespoke data processing in line with your organization’s unique requirements. For instance, this could range from dispatching the data to Azure Monitor, streaming it to Power BI dashboards, or even sending detailed consumption reports via Azure Communication Service. [Function("TokenUsageFunction")] public async Task Run([EventHubTrigger("%EventHubName%", Connection = "EventHubConnection")] string[] openAiTokenResponse) { //Eventhub Messages arrive as an array foreach (var tokenData in openAiTokenResponse) { try { _logger.LogInformation($"Azure OpenAI Tokens Data Received: {tokenData}"); var OpenAiToken = JsonSerializer.Deserialize<OpenAiToken>(tokenData); if (OpenAiToken == null) { _logger.LogError($"Invalid OpenAi Api Token Response Received. Skipping."); continue; } _telemetryClient.TrackEvent("Azure OpenAI Tokens", OpenAiToken.ToDictionary()); } catch (Exception e) { _logger.LogError($"Error occured when processing TokenData: {tokenData}", e.Message); } } } In the example above, Azure function processes the tokens response data in Event Hub and sends them to Application Insights telemetry, and a basic Dashboard is configured in Azure, displaying the token consumption for each client application. This information can conveniently be used to compute chargeback costs. A sample query used in dashboard above that fetches tokens consumed by a specific client: customEvents | where name contains "Azure OpenAI Tokens" | extend tokenData = parse_json(customDimensions) | where tokenData.AppKey contains "your-client-key" | project Timestamp = tokenData.Timestamp, Stream = tokenData.Stream, ApiOperation = tokenData.ApiOperation, PromptTokens = tokenData.PromptTokens, CompletionTokens = tokenData.CompletionTokens, TotalTokens = tokenData.TotalTokens Azure OpenAI Landing Zone reference architecture A crucial detail to ensure the effectiveness of this approach is to secure the Azure OpenAI service by implementing Private Endpoints and using Managed Identities for App Service to authorize access to Azure AI services. This will limit access so that only the App Service can communicate with the Azure OpenAI service. Failing to do this would render the solution ineffective, as individuals could bypass the APIM/App Service and directly access the OpenAI Service if they get hold of the access key for OpenAI. Refer to Azure OpenAI Landing Zone reference architecture to build a secure and scalable AI environment. Additional Considerations If the client application is external, consider using an Application Gateway in front of the Azure APIM If "streaming" is set to true, tokens count is not returned in response. In that that case libraries like tiktoken (Python), orgpt-3-encoder(javascript) for most GPT-3 models can be used to programmatically calculate tokens count for the user prompt and completion response. A useful guideline to remember is that in typical English text, one token is approximately equal to around 4 characters. This equates to about three-quarters of a word, meaning that 100 tokens are roughly equivalent to 75 words. (P.S. Microsoft does not endorse or guarantee any third-party libraries.) A subscription key or a custom header like app-key can also be used to uniquely identify the client as appId in OAuth token is not very intuitive. Rate-limiting can be implemented for incoming requests using OAuth tokens or Subscription Keys, adding another layer of security and resource management. The solution can also be extended to redirect different clients to different Azure OpenAI instances. For example., some clients utilize an Azure OpenAI instance with default quotas, whereas premium clients get to consume Azure Open AI instance with dedicated throughput. Conclusion Azure OpenAI Service stands as an indispensable tool for organizations seeking to harness the immense power of language models. With the feature of provisioned throughput, clients can define their usage limits in throughput units and freely allocate these to the OpenAI model of their choice. However, the financial commitment can be significant and is dependent on factors like the chosen model's type, size, and utilization. An effective chargeback system offers several advantages, such as heightened accountability, transparent costing, and judicious use of resources within the organization.21KViews10likes10CommentsReimagining App Modernization for the Era of AI
This blog highlights the key announcements and innovations from Microsoft Build 2025. It focuses on how AI is transforming the software development lifecycle, particularly in app modernization. Key topics include the use of GitHub Copilot for accelerating development and modernization, the introduction of Azure SRE agent for managing production systems, and the launch of the App Modernization Guidance to help organizations modernize their applications with AI-first design. The blog emphasizes the strategic approach to modernization, aiming to reduce complexity, improve agility, and deliver measurable business outcomes4.3KViews2likes0CommentsGenAI Gateway Accelerator
Every app will be reinvented with Generative AI and new apps will be built that weren’t possible before. Generative AI helps to build intelligent apps using the Large Language Model (LLM) capabilities. As the number of intelligent applications grows alongside the adoption of various large language models (LLMs), enterprises encounter significant challenges in efficiently federating and managing generative AI resources. Large enterprises look for a centralized solution “Gen AI Gateway” that must seamlessly integrate, optimize, and distribute the workloads across a federated network of GenAI resources. This blog post provides an overview of how Azure API Management can be used as a GenAI Gateway leveraging the new accelerator scenario named “GenAI Gateway Accelerator” published on the APIM Landing Zone Accelerator.4.3KViews2likes3CommentsAPI teams and Platform teams for better API management
This article is written partly as a conversation. As developers, we usually have questions and the idea is to lay out these questions and answer them in a conversational manner thereby making it easier to understand. API management Let's provide some context what API management is all about. API management is the process of creating and publishing web APIs, enforcing their usage policies, controlling access, nurturing the subscriber community, collecting and analyzing usage statistics, and reporting on performance. API management helps organizations publish APIs to external, partner, and internal developers to unlock the potential of their data and services. All that sounds like a nice pitch, but what does it really mean? Let's break it down. You're a developer, you build an API and you may even deploy it and you have a few users even. Then what happens? You need to manage it. By managing it we mean that you need to: Control who can access it, this can be as complicated as you like. You may start with an admin users and normal uses and may define therefore different access levels. That's great for a while. But what if you have a lot of users? What if your API is used in a large organization, with different departments, what if your API is used in many different Apps? You need to manage all that. Monitor its usage. Monitoring, you need to know how your API is used. You need to know how many times it's called, by whom, when, how long it takes to respond, all to ensure your users are happy and you use your resources efficiently. The latter is especially important if you have a lot of users and if you pay for resources. Secure it. In most cases, you can't ship something without securing it. You need to ensure that only the right people at the right time can access your API and you need to ensure that your API is not misused. Scale it. Of course, scaling is a big topic. Just put it in the cloud right? Well, yes, but you need to ensure that your API can handle the load. You need to ensure that your API can scale up and down as needed. There's a lot to think about here. Ok, so I pick the right libraries and deploy it to the cloud, do I need a cloud service specifically focusing on API management? Yes, well, as with everything it depends, if you feel some of the following symptoms, you might want to consider using an API management service: Symptom 1: Inconsistent API Performance: If your APIs are experiencing frequent performance issues, such as slow response times or high error rates, an Azure API Management service can help monitor, diagnose, and optimize API performance. Symptom 2: Security Concerns: If you're worried about unauthorized access to your APIs or data breaches, Azure API Management services provide robust security features, including authentication, authorization, and threat protection. Symptom 3: Complex API Lifecycle management: Managing multiple API versions, deprecating old APIs, and ensuring seamless transitions can be challenging. Azure API Management services offer tools for API lifecycle management, simplifying these processes. Symptom 4: Lack of Analytics and Monitoring: If you lack visibility into API usage, performance metrics, and error rates, Azure API Management services provide comprehensive analytics and monitoring capabilities to help you track and optimize API usage. Symptom 5: Integration Challenges: If integrating various applications, systems, and services is becoming increasingly complex, Azure API Management services can centralize and streamline these integrations, making it easier to connect different components of your technology stack. Ok, so know we know what to look for in terms of symptoms, but let's talk about API teams and platform teams next. API Teams and Platform Teams You usually don't start out with a large organization with many APIs and many users. You start small and grow your business from there. As your business grows, you may have more APIs, more users and you find the need to have different teams to manage your APIs. You might have an API team and Operations team (OPS). At some point though you start seeing problems like the below: Scalability and Standardization: Your company is growing rapidly and needs to standardize processes across multiple teams and projects, platform teams can help create consistent, reusable infrastructure and tools. Complex Integrations: If you're dealing with complex integrations between various systems and services. Security and Compliance: As your company scales, maintaining security and compliance becomes more challenging. Operational Efficiency: When operational inefficiencies start to impact productivity. Cloud Adoption: You've started using the cloud and is not using it to its full potential, for example inefficient resource allocation. Let's have a chat with Ella and Dave next to see where they are in their journey. Meet Ella and Dave Meet Ella and Dave. Ella is an experienced developer who has been using Azure API Management for a while and even workspaces. Dave is new to Azure API Management and is looking to learn more about it. New to Azure API Management Dave: Hey Ella, I recently signed up for Azure API Management to manage my APIs. I heard you’ve been using Azure API Management workspaces. Can you tell me more about it? Ella: Absolutely, Dave! Azure API Management workspaces have been a game-changer for our team. They provide a structured environment for managing APIs, which is especially useful for federated API management. "Federated" API management Dave: Federated API management? What’s that exactly? Ella: Federated API management is a decentralized approach where different teams within an organization manage their APIs independently. This allows for greater flexibility and responsiveness. However, it still maintains centralized governance to ensure consistency, security, and compliance across the organization. Dave: That sounds interesting. Why is federated API management so important? Ella: There are several reasons: Enhanced Flexibility: Teams can manage their APIs independently, allowing them to respond quickly to specific needs and changes without waiting for centralized approval. Improved Collaboration: Different teams can work on their APIs simultaneously, fostering collaboration and reducing bottlenecks. Scalability: As organizations grow, federated API management allows for scalable API management practices that can adapt to increasing complexity and volume. Centralized Governance: Despite decentralization, centralized governance ensures that all APIs adhere to organizational standards, security policies, and compliance requirements. Dave: That makes sense. So, who are the main teams involved in federated API management? API teams and Platform Teams Ella: Primarily, there are two key teams: Platform Teams: They are responsible for the overall infrastructure and tools that support API development and management. They ensure that the API management platform is robust, scalable, and secure. API Teams: These teams focus on the development, maintenance, and optimization of individual APIs. They create APIs that meet specific business needs and integrate seamlessly with other systems. How do Azure API Management workspace enable federated API management? Dave: Got it. How do Azure API Management workspaces enable federated API management? Ella: Workspaces are the only way to use federated API management in Azure API Management. It enables it through isolation of control plane, optional isolation of data plane, and platform-level governance controls. Control plane isolation. This allows teams to independently build and manage their APIs. API Runtime isolation. Isolate faults. API platform team controls. Federate API monitoring, enforce policies, and unify API discovery. Dave: So, it sounds like workspaces really help with collaboration and governance. Ella: workspaces isolate teams and give them autonomy to more effortlessly manage APIs without running into conflicts with each other. Using workspaces enables that isolation between existing teams and future teams that will be onboarded onto the platform, which makes it easier to grow, scale, and operate API ecosystems. Getting started Dave: That’s really helpful, Ella. I think I’ll look into setting up Azure API Management workspaces for my team. Any tips on getting started? Ella: Sure! Check out the below resources. Dave: Thanks, Ella. This has been really insightful. I’m excited to get started with Azure API Management workspaces! Ella: Anytime, Dave! Feel free to reach out if you have any more questions. Good luck with your APIs! Resources: Workspaces in Azure API Management | Microsoft Learn Set up a workspace in Azure API Management | Microsoft Learn Deploy an app to Azure API Management with generative AI features540Views0likes0CommentsIntroducing the new community site for Azure API Management aka.ms/apimlove
In this article, we'll showcase the new community site for Azure API management https://aka.ms/apimlove. This site is a one-stop shop for all things related to Azure API management, including videos, tutorials, and community resources. You want your blog posts and videos featured? Reply in the comments. What is Azure API management? If you're completely new to Azure API management, it's a service that allows you to create, publish, and manage APIs in a secure and scalable way. With Azure API management, you can expose your APIs to external developers, partners, and internal teams, and monitor and analyze their usage. Additionally it also has a great story around generative AI and how you can take these APIs to production. How you might wonder? Well, the short answer is that there are some great policies made specifically for generative AI that you can apply to your APIs, let's describe it a bit more in detail. Problems and Solutions with generative AI APIs A good way to understand the capabilities of Azure API management is to look at some common problems faced when working with generative AI APIs and how Azure API management solves them. Here are some key problems and solutions: Token Usage Management - Problem: Tracking token usage across multiple applications and ensuring fair distribution. - Solution: Azure APIM provides a token limit policy that allows you to set quotas on token usage per application. This ensures that no single application consumes the entire token quota1. Load Balancing and Error Management - Problem: Distributing load across multiple instances and managing errors to ensure high availability. - Solution: APIM supports load balancing to distribute requests across multiple endpoints. It also implements a circuit breaker pattern to stop requests to failing instances and redirect them to healthy ones2. Monitoring and Metrics - Problem: Monitoring API usage, request success/failure rates, and token consumption. - Solution: APIM provides detailed monitoring and metrics capabilities, including policies to emit token usage metrics. This helps in tracking how many tokens are used and how many are left. Security - Problem: Securing API access and managing API keys. - Solution: APIM allows you to use managed identities for secure authentication, reducing the need to distribute API keys manually. This enhances security by ensuring only authorized applications can access the APIs2. Cost Management - Problem: Managing costs associated with API usage and ensuring efficient resource utilization. - Solution: APIM helps in caching responses to reduce load on the AI model, saving costs and improving performance. It also ensures that committed capacity in Provisioned Throughput Units (PTUs) is exhausted before falling back to pay-as-you-go instances1. Summary: Key Policies and Constructs - Token Limit Policy: Sets quotas on token usage per application. - Emit Token Metric Policy: Tracks and emits metrics related to token usage. - Load Balancing: Distributes requests across multiple endpoints. - Circuit Breaker Pattern: Manages errors by redirecting requests from failing instances to healthy ones. - Managed Identities: Provides secure authentication without the need for API keys. - Caching: Reduces load on the AI model by caching responses. Learn more here Community Site Features So what does this new site offer? Here are some key features: - Videos: Watch tutorials and demos on Azure API management and generative AI. - Tutorials: Step-by-step guides on how to use Azure API management for generative AI. - Community Resources: Connect with other developers, share tips and tricks, and get help with your projects. This site will be updated regularly with new content, so be sure to check back often for the latest updates. Resources - Community site, https://aka.ms/apimlove - Azure API management documentation, https://docs.microsoft.com/en-us/azure/api-management/ - Generative AI gateway capabilities, https://learn.microsoft.com/en-us/azure/api-management/genai-gateway-capabilities844Views0likes0CommentsUnlock New AI and Cloud Potential with .NET 9 & Azure: Faster, Smarter, and Built for the Future
.NET 9, now available to developers, marks a significant milestone in the evolution of the .NET platform, pushing the boundaries of performance, cloud-native development, and AI integration. This release, shaped by contributions from over 9,000 community members worldwide, introduces thousands of improvements that set the stage for the future of application development. With seamless integration with Azure and a focus on cloud-native development and AI capabilities, .NET 9 empowers developers to build scalable, intelligent applications with unprecedented ease. Expanding Azure PaaS Support for .NET 9 With the release of .NET 9, a comprehensive range of Azure Platform as a Service (PaaS) offerings now fully support the platform’s new capabilities, including the latest .NET SDK for any Azure developer. This extensive support allows developers to build, deploy, and scale .NET 9 applications with optimal performance and adaptability on Azure. Additionally, developers can access a wealth of architecture references and sample solutions to guide them in creating high-performance .NET 9 applications on Azure’s powerful cloud services: Azure App Service: Run, manage, and scale .NET 9 web applications efficiently. Check out this blog to learn more about what's new in Azure App Service. Azure Functions: Leverage serverless computing to build event-driven .NET 9 applications with improved runtime capabilities. Azure Container Apps: Deploy microservices and containerized .NET 9 workloads with integrated observability. Azure Kubernetes Service (AKS): Run .NET 9 applications in a managed Kubernetes environment with expanded ARM64 support. Azure AI Services and Azure OpenAI Services: Integrate advanced AI and OpenAI capabilities directly into your .NET 9 applications. Azure API Management, Azure Logic Apps, Azure Cognitive Services, and Azure SignalR Service: Ensure seamless integration and scaling for .NET 9 solutions. These services provide developers with a robust platform to build high-performance, scalable, and cloud-native applications while leveraging Azure’s optimized environment for .NET. Streamlined Cloud-Native Development with .NET Aspire .NET Aspire is a game-changer for cloud-native applications, enabling developers to build distributed, production-ready solutions efficiently. Available in preview with .NET 9, Aspire streamlines app development, with cloud efficiency and observability at its core. The latest updates in Aspire include secure defaults, Azure Functions support, and enhanced container management. Key capabilities include: Optimized Azure Integrations: Aspire works seamlessly with Azure, enabling fast deployments, automated scaling, and consistent management of cloud-native applications. Easier Deployments to Azure Container Apps: Designed for containerized environments, .NET Aspire integrates with Azure Container Apps (ACA) to simplify the deployment process. Using the Azure Developer CLI (azd), developers can quickly provision and deploy .NET Aspire projects to ACA, with built-in support for Redis caching, application logging, and scalability. Built-In Observability: A real-time dashboard provides insights into logs, distributed traces, and metrics, enabling local and production monitoring with Azure Monitor. With these capabilities, .NET Aspire allows developers to deploy microservices and containerized applications effortlessly on ACA, streamlining the path from development to production in a fully managed, serverless environment. Integrating AI into .NET: A Seamless Experience In our ongoing effort to empower developers, we’ve made integrating AI into .NET applications simpler than ever. Our strategic partnerships, including collaborations with OpenAI, LlamaIndex, and Qdrant, have enriched the AI ecosystem and strengthened .NET’s capabilities. This year alone, usage of Azure OpenAI services has surged to nearly a billion API calls per month, illustrating the growing impact of AI-powered .NET applications. Real-World AI Solutions with .NET: .NET has been pivotal in driving AI innovations. From internal teams like Microsoft Copilot creating AI experiences with .NET Aspire to tools like GitHub Copilot, developed with .NET to enhance productivity in Visual Studio and VS Code, the platform showcases AI at its best. KPMG Clara is a prime example, developed to enhance audit quality and efficiency for 95,000 auditors worldwide. By leveraging .NET and scaling securely on Azure, KPMG implemented robust AI features aligned with strict industry standards, underscoring .NET and Azure as the backbone for high-performing, scalable AI solutions. Performance Enhancements in .NET 9: Raising the Bar for Azure Workloads .NET 9 introduces substantial performance upgrades with over 7,500 merged pull requests focused on speed and efficiency, ensuring .NET 9 applications run optimally on Azure. These improvements contribute to reduced cloud costs and provide a high-performance experience across Windows, Linux, and macOS. To see how significant these performance gains can be for cloud services, take a look at what past .NET upgrades achieved for Microsoft’s high-scale internal services: Bing achieved a major reduction in startup times, enhanced efficiency, and decreased latency across its high-performance search workflows. Microsoft Teams improved efficiency by 50%, reduced latency by 30–45%, and achieved up to 100% gains in CPU utilization for key services, resulting in faster user interactions. Microsoft Copilot and other AI-powered applications benefited from optimized runtime performance, enabling scalable, high-quality experiences for users. Upgrading to the latest .NET version offers similar benefits for cloud apps, optimizing both performance and cost-efficiency. For more information on updating your applications, check out the .NET Upgrade Assistant. For additional details on ASP.NET Core, .NET MAUI, NuGet, and more enhancements across the .NET platform, check out the full Announcing .NET 9 blog post. Conclusion: Your Path to the Future with .NET 9 and Azure .NET 9 isn’t just an upgrade—it’s a leap forward, combining cutting-edge AI integration, cloud-native development, and unparalleled performance. Paired with Azure’s scalability, these advancements provide a trusted, high-performance foundation for modern applications. Get started by downloading .NET 9 and exploring its features. Leverage .NET Aspire for streamlined cloud-native development, deploy scalable apps with Azure, and embrace new productivity enhancements to build for the future. For additional insights on ASP.NET, .NET MAUI, NuGet, and more, check out the full Announcing .NET 9 blog post. Explore the future of cloud-native and AI development with .NET 9 and Azure—your toolkit for creating the next generation of intelligent applications.9.7KViews2likes1CommentPast due! Act now to upgrade from these retired Azure services
This is your friendly reminder that the following Azure services were retired on August 31, 2024: Azure App Service Environment v1 and v2 Logic Apps Integration Service Environment Azure API Management stv1 Congratulations to the majority of our customers who have completed the migration to the latest versions! Your timely actions have ensured the continued security and performance of your applications and data. For those still running the retired environments, it is crucial to migrate immediately to avoid security risks and data loss. As part of the retirement process, Azure has already begun decommissioning the hardware. It is possible that your retired environment will experience intermittent outages, or it may be suspended. Please complete your migration as soon as possible. Azure App Service Environment (ASE) v1 and v2:If your environment experiences any intermittent outages, it is important that you acknowledge the outages in the Azure Portal and begin work to migrate immediately. You may also request a grace period to complete the migration. If there is no request for grace period or no action from customers after repeated reminders, the environment may be suspended or deleted, or we may attempt to auto-migrate to the new version. Please consider this only as a last resort and complete the migration using the available resources. This last-resort scenario may require additional configuration from customers to bring the applications back online. If your environment has been automatically migrated, please visit product documentation to learn more: Prevent and recover from an auto-migration of an App Service Environment - Azure App Service Environment | Microsoft Learn. Logic Apps Integration Services Environment: (ISE): Customers who remain on ISE after the retirement date may have experienced outages. To avoid service disruptions, please export your logic apps workflows from ISE to Logic Apps Standard at the earliest. Additionally, read-only instances will continue to incur standard charges. To avoid unnecessary costs, we recommend customers delete any instances that are no longer in use. As of October 1, 2024, Logic Apps executions on all ISE Developer and ISE Premium instances have been stopped and these instances are also read-only. Logic Apps deployed to these instances will be available for export for a limited time. From January 6, 2025 all ISE instances (Developer and Premium) will start being deleted, incurring loss of data. Azure API Management stv1: Customers who remain on APIM stv1 after the retirement date may have experienced outages. As of October 1, 2024 remaining APIM stv1 service instances have started to undergo auto-migration to the APIM stv2 compute platform. Automatic migration may cause downtime for upstream API consumers, and customers may need to update their network dependencies. All affected customers will be notified of the ongoing automatic migration one week in advance through emails to the subscription administrators and Azure Service Health Portal notifications. To avoid service disruptions, please migrate instances running on stv1 to stv2 at the earliest. The latest migration option addresses the networking dependencies, particularly the need for new subnets and IP changes. You can now retain the original IPs, both public and private, significantly simplifying the migration process. What is the impact on support and SLA? As of September 1, 2024, the Service Level Agreement (SLA) will no longer be applicable for continued use of the retired products beyond the retirement date. Azure customer support will continue to handle support cases in a commercially reasonable manner. No new security and compliance investments will be made. The ability to effectively mitigate issues that might arise from lower-level Azure dependencies may be impaired due to the retirement. What is the call to action? If you are still running one or more of the following services, please use the available resources listed here to complete the migration at the earliest. Announcement Learn live Migration Resources Public resources App Service Environment version 1 and version 2 will be retired on 31 August 2024 Episode 1 Bonus episode: Side by side migration App Service Environment version 3 migration Using the in-place migration feature Auto migration overview and grace period Estimate your cost savings Integration Services Environment will be retired on 31 August 2024 Episode 2 Logic Apps Standard migration Export ISE workflows to a Standard logic app ISE Retirement FAQ Support for API Management instances hosted on the stv1 platform will be retired by 31 August 2024. Episode 3 API Management STV2 migration696Views0likes0CommentsExciting Updates Coming to Conversational Diagnostics (Public Preview)
Last year, at Ignite 2023, we unveiled Conversational Diagnostics (Preview), a revolutionary tool integrated with AI-powered capabilities to enhance problem-solving for Windows Web Apps. This year, we're thrilled to share what’s new and forthcoming for Conversational Diagnostics (Preview). Get ready to experience a broader range of functionalities and expanded support across various Azure Products, making your troubleshooting journey even more seamless and intuitive.324Views0likes0Comments