azure functions
257 TopicsHost ChatGPT apps on Azure Functions
This blog post is for developers learning and building ChatGPT apps. It provides an overview of how these apps work, why build them, and how to host one on Azure Functions. Chat with ChatGPT apps OpenAI recently launched ChatGPT apps. These are apps you can chat with right inside ChatGPT, extending what ChatGPT can do beyond simple chats to actions. These apps can be invoked by starting a message with the app name, or they can be suggested by ChatGPT when relevant to the conversation. The following shows an example of invoking the Booking.com app to find hotels that meet certain criteria: OpenAI calls these “a new generation of apps” that “blend familiar interactive elements…with new ways of interacting through conversation.” For users, ChatGPT apps fit directly into an interface they’re already familiar with and can use with little to no learning. For developers, building these apps is great way to get them in the hands of ChatGPT’s 800 million users without having to build custom frontends or worry about distribution and discovery. The following summarizes key benefits of ChatGPT apps: Native Integration: Once connected, users can invoke apps with a simple @ mention. Contextual Actions: Your app doesn't just "chat"—it does. It can fetch real-time data or execute actions. Massive Distribution and Easy Discovery: ChatGPT has added an app directory and just announced that they’re accepting submissions. Apps in the directory are exposed ChatGPT’s massive user base. ChatGPT apps are remote MCP servers ChatGPT apps are simply remote MCP servers that expose tools, but with two notable distinctions: Their tools use metadata to specify UI elements that should be rendered when it returns a result The UI elements are exposed as MCP resources. ChatGPT invokes tools the same way agents invoke tools on any MCP server. The difference is the added ability to render the tool results in a custom UI that’s embedded in the chat as an iFrame. A UI can include buttons, text boxes, maps, and other components that users can interact with. Instead of calling a RESTful API, the UI can trigger additional tool calls in the MCP server as the user interacts with it. For example, when the Zillow app returns results to the user’s question, the results are home listings and a map that users can interact with: Since ChatGPT apps are just MCP servers, any existing server you may have can be turned into a ChatGPT app. To do that, you must ensure the server uses the streamable HTTP transport if it doesn’t already and then find a place to host it. Hosting remote MCP servers While there are many hosting platforms available, Azure Functions is uniquely positioned to host remote MCP servers as the platform provides several key benefits: Scalable Infrastructure: ChatGPT apps can go viral. Azure Function’s Flex Consumption plan can handle bursty traffic during high traffic times and scale back to zero when needed Built-in auth: Keep your server secured with Azure Function’s built-in server authentication and authorization feature Serverless billing: Pay for only when the app is run instead of idle time Learn more about remote MCP servers hosted on Azure Functions. Create ChatGPT app Let’s quickly create a sample ChatGPT app that returns the weather of a place. Prerequisites Ensure you have the following prerequisites before proceeding: Azure subscription for creating Azure Functions and related resources Azure Developer CLI for deploying MCP server via infrastructure as code ChatGPT Plus subscription for testing ChatGPT app in developer mode Deploy MCP server to Azure Functions Clone this sample MCP server: `git clone https://github.com/Azure-Samples/chatgpt-app-azure-function-mcp`. Open terminal, run `azd auth login` and complete the login flow in the browser. Navigate to sample root directory, run `azd up` to deploy the server and related resources. You’ll be prompted with: Enter a unique environment name: Enter a unique name. This is the name of the resource group where all deployed resources live. Select an Azure Subscription: Pick your subscription Enter a value for the ‘location’ infrastructure: East US Once deployment completes, copy the app url for the next step. It should look like: https://<your-app>.azurewebsites.net Sample code walkthrough The sample server is built using the Python FastMCP package. You can find more information and how to test server locally in this repo. We'll walkthough the code briefly here. In main.py, you find the `get_weather_widget` resource and `get_current_weather` tool (code abbreviated here): .resource("ui://widget/current-weather.html", mime_type="text/html+skybridge") def get_weather_widget() -> str: """Interactive HTML widget to display current weather data in ChatGPT.""" # some code... @mcp.tool( annotations={ "title": "Get Current Weather", "readOnlyHint": True, "openWorldHint": True, }, meta={ "openai/outputTemplate": "ui://widget/current-weather.html", "openai/toolInvocation/invoking": "Fetching weather data", "openai/toolInvocation/invoked": "Weather data retrieved" }, ) def get_current_weather(latitude: float, longitude: float) -> ToolResult: """Get current weather for a given latitude and longitude using Open-Meteo API.""" # some code... return ToolResult( content=content_text, structured_content=data ) When you ask ChatGPT a question, it calls the MCP tool which returns a `ToolResult` containing both human-readable content (for ChatGPT to understand) and machine-readable data (`structured_content`, raw data for the widget). Because the `get_current_weather` tool specifies an `outputTemplate` in the metadata, ChatGPT fetches the corresponding widget HTML from the `get_weather_widget` resource. To return results, it creates an iframe and injects the weather results (`structured_content`) into the widget's JavaScript environment (via `window.openai.toolOutput`). The widget's JavaScript then renders the weather data into a beautiful UI. Test ChatGPT app in developer mode Turn on Developer mode in ChatGPT: Go to Settings → Connectors → Advanced → Developer mode In the chat, click + → More → Add sources The Add + button should show next to Sources. Click Add + → Connect more In the Enable apps window, look for Advanced settings. Click Create app. A form should open. Fill out the form to create the new app Name: WeatherApp MCP Server URL: Enter the MCP server endpoint, which is the app URL you previously saved with /mcp appended. Example: https://<you-app>.azurewebsites.net/mcp Authentication: Choose No Auth Check the box for “I understand and want to continue” and click Create. Once connected, you should find the server listed under Enabled apps. Test by asking ChatGPT “@WeatherApp what’s the temperature in NYC today?” Submit to ChatGPT App Directory OpenAI has opened app submission recently. Submitting the app to the App Directory makes it accessible to all users on ChatGPT. You may want to read through the submission guidelines to ensure your app meets the requirements before submitting. What’s next In this blog post, we gave an overview of ChatGPT apps and showed how to host one in Azure Functions. We’ll dedicate the next blog post to elaborate on configuring authentication and authorization for apps hosted on Azure Functions. For users familiar with the Azure Functions MCP extension, we’re working on support for MCP Resources in the extension. You’ll be able to build ChatGPT apps using the extension once that support is out. For now, you need to use the official MCP SDKs. Closing thoughts ChatGPT apps extend the ability of ChatGPT beyond chat by letting users take actions like searching for an apartment, ordering groceries, and turning an outline into slide deck with just a mention of the app name in the chat. The directory OpenAI created where developers can submit their apps reminds one of the App Store in the iPhone. It seems to be a no-brainer now that such a marketplace should be provided. Would this also be the case for ChatGPT? Do you think the introduction of these apps is a gamechanger? And are they useful for your scenarios? Share with us your thoughts!36Views0likes0CommentsBuilding Reliable AI Travel Agents with the Durable Task Extension for Microsoft Agent Framework
The durable task extension for Microsoft Agent Framework makes all this possible. In this post, we'll walk through the AI Travel Planner, a C# application I built that demonstrates how to build reliable, scalable multi-agent applications using the durable task extension for Microsoft agent framework. While I work on the python version, I've included code snippets that show the python equivalent. If you haven't already seen the announcement post on the durable task extension for Microsoft Agent Framework, I suggest you read that first before continuing with this post: http://aka.ms/durable-extension-for-af-blog. In brief, production AI agents face real challenges: crashes can lose conversation history, unpredictable behavior makes debugging difficult, human-in-the-loop workflows require waiting without wasting resources, and variable demand needs flexible scaling. The durable task extension addresses each of these: Serverless Hosting: Deploy agents on Azure Functions with auto-scaling from thousands of instances to zero, while retaining full control in a serverless architecture. Automatic Session Management: Agents maintain persistent sessions with full conversation context that survives process crashes, restarts, and distributed execution across instances Deterministic Multi-Agent Orchestrations: Coordinate specialized durable agents with predictable, repeatable, code-driven execution patterns Human-in-the-Loop with Serverless Cost Savings: Pause for human input without consuming compute resources or incurring costs Built-in Observability with Durable Task Scheduler: Deep visibility into agent operations and orchestrations through the Durable Task Scheduler UI dashboard AI Travel Planner Architecture Overview The Travel Planner application takes user trip preferences and starts a workflow that orchestrates three specialized agent framework agents (a Destination Recommender, an Itinerary Planner, and a Local Recommender) to build a comprehensive, personalized travel plan. Once a travel plan is created, the workflow includes human-in-the-loop approval before booking the trip (mocked), showcasing how the durable task extension handles long-running operations easily: Application Workflow User Request: User submits travel preferences via React frontend Orchestration Scheduled: Azure Functions backend receives the request and schedules a deterministic agentic workflow using the Durable Task Extension for Agent Framework. Destination Recommendation: The orchestrator first coordinates the Destination Recommender agent to analyze preferences and suggest destinations Itinerary Planning and Local Recommendations: The orchestrator then parallelizes the invocation of the Itinerary Planner agent to create detailed day-by-day plans for the given destination and the Local Recommendations agent to add insider tips and attractions Storage: Created travel plan is saved to Azure Blob Storage Approval: User reviews and approves the plan (human-in-the-loop) Booking: Upon approval, booking of the trip completes Key Components Azure Static Web Apps: Hosts the React frontend Azure Functions (.NET 9): Serverless compute hosting the agents and workflow with automatic scaling Durable Task Extension for Microsoft Agent Framework: The AI agent SDK with durable task extension Durable Task Scheduler: Manages state persistence, orchestration, and observability Azure OpenAI (GPT-4o-mini): Powers the AI agents Now let’s dive into the code. Along the way, I’ll highlight the value the durable task extension brings and patterns you can apply to your own applications. Creating Durable Agents Making the standard Agent Framework agents durable agents is simple. Include the durable task extension package and register your agents within the ConfigureDurableAgents extension method and you automatically get: Persistent conversation sessions that survive restarts HTTP endpoints for agent interactions Automatic state checkpointing that survive restarts Distributed execution across instances C# FunctionsApplication .CreateBuilder(args) .ConfigureDurableAgents(configure => { configure.AddAIAgentFactory("DestinationRecommenderAgent", sp => chatClient.CreateAIAgent( instructions: "You are a travel destination expert...", name: "DestinationRecommenderAgent", services: sp)); configure.AddAIAgentFactory("ItineraryPlannerAgent", sp => chatClient.CreateAIAgent( instructions: "You are a travel itinerary planner...", name: "ItineraryPlannerAgent", services: sp, tools: [AIFunctionFactory.Create(CurrencyConverterTool.ConvertCurrency)])); configure.AddAIAgentFactory("LocalRecommendationsAgent", sp => chatClient.CreateAIAgent( instructions: "You are a local expert...", name: "LocalRecommendationsAgent", services: sp)); }); Python # Create the Azure OpenAI chat client chat_client = AzureOpenAIChatClient( endpoint=endpoint, deployment_name=deployment_name, credential=DefaultAzureCredential() ) # Destination Recommender Agent destination_recommender_agent = chat_client.create_agent( name="DestinationRecommenderAgent", instructions="You are a travel destination expert..." ) # Itinerary Planner Agent (with tools) itinerary_planner_agent = chat_client.create_agent( name="ItineraryPlannerAgent", instructions="You are a travel itinerary planner...", tools=[get_exchange_rate, convert_currency] ) # Local Recommendations Agent local_recommendations_agent = chat_client.create_agent( name="LocalRecommendationsAgent", instructions="You are a local expert..." ) # Configure Function App with Durable Agents. AgentFunctionApp is where the magic happens app = AgentFunctionApp(agents=[ destination_recommender_agent, itinerary_planner_agent, local_recommendations_agent ]) The Orchestration Programming Model The durable task extension uses an intuitive async/await programming model for deterministic orchestration. You write orchestration logic as ordinary imperative code (if/else, try/catch), and the framework handles all the complexity of coordination, durability, retries, and distributed execution. The Travel Planner Orchestration Here's the actual orchestration from the application that coordinates all three agents, runs tasks in parallel, handles human approval, and books the trip: C# [Function(nameof(RunTravelPlannerOrchestration))] public async Task<TravelPlanResult> RunTravelPlannerOrchestration( [OrchestrationTrigger] TaskOrchestrationContext context) { var travelRequest = context.GetInput<TravelRequest>()!; // Get durable agents and create conversation threads DurableAIAgent destinationAgent = context.GetAgent("DestinationRecommenderAgent"); DurableAIAgent itineraryAgent = context.GetAgent("ItineraryPlannerAgent"); DurableAIAgent localAgent = context.GetAgent("LocalRecommendationsAgent"); // Step 1: Get destination recommendations var destinations = await destinationAgent.RunAsync<DestinationRecommendations>( $"Recommend destinations for {travelRequest.Preferences}", destinationAgent.GetNewThread()); var topDestination = destinations.Result.Recommendations.First(); // Steps 2 & 3: Run itinerary and local recommendations IN PARALLEL var itineraryTask = itineraryAgent.RunAsync<TravelItinerary>( $"Create itinerary for {topDestination.Name}", itineraryAgent.GetNewThread()); var localTask = localAgent.RunAsync<LocalRecommendations>( $"Local recommendations for {topDestination.Name}", localAgent.GetNewThread()); await Task.WhenAll(itineraryTask, localTask); // Step 4: Save to blob storage await context.CallActivityAsync(nameof(SaveTravelPlanToBlob), travelPlan); // Step 5: Wait for human approval (NO COMPUTE COSTS while waiting!) var approval = await context.WaitForExternalEvent<ApprovalResponse>( "ApprovalEvent", TimeSpan.FromDays(7)); // Step 6: Book if approved if (approval.Approved) await context.CallActivityAsync(nameof(BookTrip), travelPlan); return new TravelPlanResult(travelPlan, approval.Approved); } Python app.orchestration_trigger(context_name="context") def travel_planner_orchestration(context: df.DurableOrchestrationContext): travel_request_data = context.get_input() travel_request = TravelRequest(**travel_request_data) # Get durable agents and create conversation threads destination_agent = app.get_agent(context, "DestinationRecommenderAgent") itinerary_agent = app.get_agent(context, "ItineraryPlannerAgent") local_agent = app.get_agent(context, "LocalRecommendationsAgent") # Step 1: Get destination recommendations destinations_result = yield destination_agent.run( messages=f"Recommend destinations for {travel_request.preferences}", thread=destination_agent.get_new_thread(), response_format=DestinationRecommendations ) destinations = cast(DestinationRecommendations, destinations_result.value) top_destination = destinations.recommendations[0] # Steps 2 & 3: Run itinerary and local recommendations IN PARALLEL itinerary_task = itinerary_agent.run( messages=f"Create itinerary for {top_destination.destination_name}", thread=itinerary_agent.get_new_thread(), response_format=Itinerary ) local_task = local_agent.run( messages=f"Local recommendations for {top_destination.destination_name}", thread=local_agent.get_new_thread(), response_format=LocalRecommendations ) results = yield context.task_all([itinerary_task, local_task]) itinerary = cast(Itinerary, results[0].value) local_recs = cast(LocalRecommendations, results[1].value) # Step 4: Save to blob storage yield context.call_activity("save_travel_plan_to_blob", travel_plan) # Step 5: Wait for human approval (NO COMPUTE COSTS while waiting!) approval_task = context.wait_for_external_event("ApprovalEvent") timeout_task = context.create_timer( context.current_utc_datetime + timedelta(days=7)) winner = yield context.task_any([approval_task, timeout_task]) if winner == approval_task: timeout_task.cancel() approval = approval_task.result # Step 6: Book if approved if approval.get("approved"): yield context.call_activity("book_trip", travel_plan) return TravelPlanResult(plan=travel_plan, approved=approval.get("approved")) return TravelPlanResult(plan=travel_plan, approved=False) Notice how the orchestration combines: Agent calls (await agent.RunAsync(...)) for AI-driven decisions Parallel execution (Task.WhenAll) for running multiple agents concurrently Activity calls (await context.CallActivityAsync(...)) for non-intelligent business tasks Human-in-the-loop (await context.WaitForExternalEvent(...)) for approval workflows The orchestration automatically checkpoints after each step. If a failure occurs, completed steps aren't re-executed. The orchestration resumes exactly where it left off, no need for manual intervention. Agent Patterns in Action Agent Chaining: Sequential Handoffs The Travel Planner demonstrates agent chaining where the Destination Recommender's output feeds into both the Itinerary Planner and Local Recommendations agents: C# // Agent 1: Get destination recommendations var destinations = await destinationAgent.RunAsync<DestinationRecommendations>(prompt, thread); var topDestination = destinations.Result.Recommendations.First(); // Agent 2: Create itinerary based on Agent 1's output var itinerary = await itineraryAgent.RunAsync<TravelItinerary>( $"Create itinerary for {topDestination.Name}", thread); Python # Agent 1: Get destination recommendations destinations_result = yield destination_agent.run( messages=prompt, thread=thread, response_format=DestinationRecommendations ) destinations = cast(DestinationRecommendations, destinations_result.value) top_destination = destinations.recommendations[0] # Agent 2: Create itinerary based on Agent 1's output itinerary_result = yield itinerary_agent.run( messages=f"Create itinerary for {top_destination.destination_name}", thread=thread, response_format=Itinerary ) itinerary = cast(Itinerary, itinerary_result.value) Agent Parallelization: Concurrent Execution The app runs the Itinerary Planner and Local Recommendations agents in parallel to reduce latency: C# // Launch both agent calls simultaneously var itineraryTask = itineraryAgent.RunAsync<TravelItinerary>(itineraryPrompt, thread1); var localTask = localAgent.RunAsync<LocalRecommendations>(localPrompt, thread2); // Wait for both to complete await Task.WhenAll(itineraryTask, localTask); Python # Launch both agent calls simultaneously itinerary_task = itinerary_agent.run( messages=itinerary_prompt, thread=thread1, response_format=Itinerary ) local_task = local_agent.run( messages=local_prompt, thread=thread2, response_format=LocalRecommendations ) # Wait for both to complete results = yield context.task_all([itinerary_task, local_task]) itinerary = cast(Itinerary, results[0].value) local_recs = cast(LocalRecommendations, results[1].value) Human-in-the-Loop: Approval Workflows The Travel Planner includes a complete human-in-the-loop pattern. After generating the travel plan, the workflow pauses for user approval: C# // Send approval request notification await context.CallActivityAsync(nameof(RequestApproval), travelPlan); // Wait for approval - NO COMPUTE COSTS OR LLM TOKENS while waiting! var approval = await context.WaitForExternalEvent<ApprovalResponse>( "ApprovalEvent", TimeSpan.FromDays(7)); if (approval.Approved) await context.CallActivityAsync(nameof(BookTrip), travelPlan); Python # Send approval request notification await context.CallActivityAsync(nameof(RequestApproval), travelPlan); # Wait for approval - NO COMPUTE COSTS OR LLM TOKENS while waiting! var approval = await context.WaitForExternalEvent<ApprovalResponse>( "ApprovalEvent", TimeSpan.FromDays(7)); if (approval.Approved) await context.CallActivityAsync(nameof(BookTrip), travelPlan); The API endpoint to handle approval responses: C# [Function(nameof(HandleApprovalResponse))] public async Task HandleApprovalResponse( [HttpTrigger("post", Route = "approve/{instanceId}")] HttpRequestData req, string instanceId, [DurableClient] DurableTaskClient client) { var approval = await req.ReadFromJsonAsync<ApprovalResponse>(); await client.RaiseEventAsync(instanceId, "ApprovalEvent", approval); } Python app.function_name(name="ApproveTravelPlan") app.route(route="travel-planner/approve/{instance_id}", methods=["POST"]) app.durable_client_input(client_name="client") async def approve_travel_plan(req: func.HttpRequest, client) -> func.HttpResponse: instance_id = req.route_params.get("instance_id") approval = req.get_json() await client.raise_event(instance_id, "ApprovalEvent", approval) return func.HttpResponse( json.dumps({"message": "Approval processed"}), status_code=200, mimetype="application/json" ) The workflow generates a complete travel plan and waits up to 7 days for user approval. During this entire waiting period, since we're hosting this application on the Functions Flex Consumption plan, the app scales down and zero compute resources or LLM tokens are consumed. When the user approves (or the timeout expires), the app scales back up and the orchestration automatically resumes with full context intact. Real-Time Monitoring with the Durable Task Scheduler Since we're using the Durable Task Scheduler as the backend for our durable agents, we're provided with a built-in dashboard for monitoring our agents and orchestrations in real-time. Agent Thread Insights Conversation history: View complete conversation threads for each agent session, including all messages, tool calls, and agent decisions Task timing: Monitor how long specific tasks and agent interactions take to complete Orchestration Insights Multi-agent visualization: See the execution flow across multiple agents with visual representation of parallel executions and branching Real-time monitoring: Track active orchestrations, queued work items, and agent states Performance metrics: Monitor response times, token usage, and orchestration duration Debugging Capabilities View structured inputs and outputs for activities, agents, and tool calls Trace tool invocations and their outcomes Monitor external event handling for human-in-the-loop scenarios The dashboard enables you to understand exactly what your agents and workflows are doing, diagnose issues quickly, and optimize performance, all without adding custom logging to your code. Try The Travel Planner Application That’s it! That’s the gist of how the AI Travel Planner application is put together and some of the key components that it took to build it. I'd love for you to try the application out for yourself. It's fully instrumented with the Azure Developer CLI and Bicep, so you can deploy it to Azure with a few simple CLI commands. Click here to try the AI Travel Planner sample Python version coming soon! Demo Video Learn More Durable Task Extension Overview Durable Agent Features Durable Task Scheduler AI Travel Planner GitHub Repository317Views1like0CommentsCall Function App from Azure Data Factory with Managed Identity Authentication
Integrating Azure Function Apps into your Azure Data Factory (ADF) workflows is a common practice. To enhance security beyond the use of function API keys, leveraging managed identity authentication is strongly recommended. Given the fact that many existing guides were outdated with recent updates to Azure services, this article provides a comprehensive, up-to-date walkthrough on configuring managed identity in ADF to securely call Function Apps. The provided methods can also be adapted to other Azure services that need to call Function Apps with managed identity authentication. The high level process is: Enable Managed Identity on Data Factory Configure Microsoft Entra Sign-in on Azure Function App Configure Linked Service in Data Factory Assign Permissions to the Data Factory in Azure Function Step 1: Enable Managed Identity on Data Factory On the Data Factory’s portal, go to Managed Identities, and enable a system assigned managed identity. Step 2: Configure Microsoft Entra Sign-in on Azure Function App 1. Go to Function App portal and enable Authentication. Choose "Microsoft" as the identity provider. 2. Add an app registration to the app, it could be an existing one or you can choose to let the platform create a new app registration. 3. Next, allow the ADF as a client application to authenticate to the function app. This step is a new requirement from previous guides. If these settings are not correctly set, the 403 response will be returned: Add the Application ID of the ADF managed identity in Allowed client application and Object ID of the ADF managed identity in the Allowed identities. If the requests are only allowed from specific tenants, add the Tenant ID of the managed identity in the last box. 4. This part sets the response from function app for the unauthenticated requests. We should set the response as "HTTP 401 Unauthorized: recommended for APIs" as sign-in page is not feasible for API calls from ADF. 5. Then, click next and use the default permission option. 6. After everything is set, click "Add" to complete the configuration. Copy the generated App (client) id, as this is used in data factory to handle authorization. Step 3: Configure Linked Service in Data Factory 1. To use an Azure Function activity in a pipeline, follow the steps here: Create an Azure Function activity with UI 2. Then Edit or New a Azure Function Linked Service. 3. Change authentication method to System Assigned Managed Identity, and paste the copied client ID of function app identity provider from Step 2 into Resource ID. This step is necessary as authorization does not work without this. Step 4: Assign Permissions to the Data Factory in Azure Function 1. On the function app portal, go to Access control (IAM), and Add a new role assignment. 2. Assign reader role. 3. Assign the Data Factory’s Managed Identity to that role. After everything is set, test that the function app can be called from Azure Data Factory successfully. Reference: https://prodata.ie/2022/06/16/enabling-managed-identity-authentication-on-azure-functions-in-data-factory/ https://learn.microsoft.com/en-us/azure/data-factory/control-flow-azure-function-activity https://docs.azure.cn/en-us/app-service/overview-authentication-authorization1.6KViews0likes2CommentsIndustry-Wide Certificate Changes Impacting Azure App Service Certificates
Executive Summary In early 2026, industry-wide changes mandated by browser applications and the CA/B Forum will affect both how TLS certificates are issued as well as their validity period. The CA/B Forum is a vendor body that establishes standards for securing websites and online communications through SSL/TLS certificates. Azure App Service is aligning with these standards for both App Service Managed Certificates (ASMC, free, DigiCert-issued) and App Service Certificates (ASC, paid, GoDaddy-issued). Most customers will experience no disruption. Action is required only if you pin certificates or use them for client authentication (mTLS). Who Should Read This? App Service administrators Security and compliance teams Anyone responsible for certificate management or application security Quick Reference: What’s Changing & What To Do Topic ASMC (Managed, free) ASC (GoDaddy, paid) Required Action New Cert Chain New chain (no action unless pinned) New chain (no action unless pinned) Remove certificate pinning Client Auth EKU Not supported (no action unless cert is used for mTLS) Not supported (no action unless cert is used for mTLS) Transition from mTLS Validity No change (already compliant) Two overlapping certs issued for the full year None (automated) If you do not pin certificates or use them for mTLS, no action is required. Timeline of Key Dates Date Change Action Required Mid-Jan 2026 and after ASMC migrates to new chain ASMC stops supporting client auth EKU Remove certificate pinning if used Transition to alternative authentication if the certificate is used for mTLS Mar 2026 and after ASC validity shortened ASC migrates to new chain ASC stops supporting client auth EKU Remove certificate pinning if used Transition to alternative authentication if the certificate is used for mTLS Actions Checklist For All Users Review your use of App Service certificates. If you do not pin these certificates and do not use them for mTLS, no action is required. If You Pin Certificates (ASMC or ASC) Remove all certificate or chain pinning before their respective key change dates to avoid service disruption. See Best Practices: Certificate Pinning. If You Use Certificates for Client Authentication (mTLS) Switch to an alternative authentication method before their respective key change dates to avoid service disruption, as client authentication EKU will no longer be supported for these certificates. See Sunsetting the client authentication EKU from DigiCert public TLS certificates. See Set Up TLS Mutual Authentication - Azure App Service Details & Rationale Why Are These Changes Happening? These updates are required by major browser programs (e.g., Chrome) and apply to all public CAs. They are designed to enhance security and compliance across the industry. Azure App Service is automating updates to minimize customer impact. What’s Changing? New Certificate Chain Certificates will be issued from a new chain to maintain browser trust. Impact: Remove any certificate pinning to avoid disruption. Removal of Client Authentication EKU Newly issued certificates will not support client authentication EKU. This change aligns with Google Chrome’s root program requirements to enhance security. Impact: If you use these certificates for mTLS, transition to an alternate authentication method. Shortening of Certificate Validity Certificate validity is now limited to a maximum of 200 days. Impact: ASMC is already compliant; ASC will automatically issue two overlapping certificates to cover one year. No billing impact. Frequently Asked Questions (FAQs) Will I lose coverage due to shorter validity? No. For App Service Certificate, App Service will issue two certificates to span the full year you purchased. Is this unique to DigiCert and GoDaddy? No. This is an industry-wide change. Do these changes impact certificates from other CAs? Yes. These changes are an industry-wide change. We recommend you reach out to your certificates’ CA for more information. Do I need to act today? If you do not pin or use these certs for mTLS, no action is required. Glossary ASMC: App Service Managed Certificate (free, DigiCert-issued) ASC: App Service Certificate (paid, GoDaddy-issued) EKU: Extended Key Usage mTLS: Mutual TLS (client certificate authentication) CA/B Forum: Certification Authority/Browser Forum Additional Resources Changes to the Managed TLS Feature Set Up TLS Mutual Authentication Azure App Service Best Practices – Certificate pinning DigiCert Root and Intermediate CA Certificate Updates 2023 Sunsetting the client authentication EKU from DigiCert public TLS certificates Feedback & Support If you have questions or need help, please visit our official support channels or the Microsoft Q&A, where our team and the community can assist you.631Views1like0CommentsImportant Changes to App Service Managed Certificates: Is Your Certificate Affected?
Overview As part of an upcoming industry-wide change, DigiCert, the Certificate Authority (CA) for Azure App Service Managed Certificates (ASMC), is required to migrate to a new validation platform to meet multi-perspective issuance corroboration (MPIC) requirements. While most certificates will not be impacted by this change, certain site configurations and setups may prevent certificate issuance or renewal starting July 28, 2025. Update December 8, 2025 We’ve published an update in November about how App Service Managed Certificates can now be supported on sites that block public access. This reverses the limitation introduced in July 2025, as mentioned in this blog. Note: This blog post reflects a point-in-time update and will not be revised. For the latest and most accurate details on App Service Managed Certificates, please refer to official documentation or subsequent updates. Learn more about the November 2025 update here: Follow-Up to ‘Important Changes to App Service Managed Certificates’: November 2025 Update. August 5, 2025 We’ve published a Microsoft Learn documentation titled App Service Managed Certificate (ASMC) changes – July 28, 2025 that contains more in-depth mitigation guidance and a growing FAQ section to support the changes outlined in this blog post. While the blog currently contains the most complete overview, the documentation will soon be updated to reflect all blog content. Going forward, any new information or clarifications will be added to the documentation page, so we recommend bookmarking it for the latest guidance. What Will the Change Look Like? For most customers: No disruption. Certificate issuance and renewals will continue as expected for eligible site configurations. For impacted scenarios: Certificate requests will fail (no certificate issued) starting July 28, 2025, if your site configuration is not supported. Existing certificates will remain valid until their expiration (up to six months after last renewal). Impacted Scenarios You will be affected by this change if any of the following apply to your site configurations: Your site is not publicly accessible: Public accessibility to your app is required. If your app is only accessible privately (e.g., requiring a client certificate for access, disabling public network access, using private endpoints or IP restrictions), you will not be able to create or renew a managed certificate. Other site configurations or setup methods not explicitly listed here that restrict public access, such as firewalls, authentication gateways, or any custom access policies, can also impact eligibility for managed certificate issuance or renewal. Action: Ensure your app is accessible from the public internet. However, if you need to limit access to your app, then you must acquire your own SSL certificate and add it to your site. Your site uses Azure Traffic Manager "nested" or "external" endpoints: Only “Azure Endpoints” on Traffic Manager will be supported for certificate creation and renewal. “Nested endpoints” and “External endpoints” will not be supported. Action: Transition to using "Azure Endpoints". However, if you cannot, then you must obtain a different SSL certificate for your domain and add it to your site. Your site relies on *.trafficmanager.net domain: Certificates for *.trafficmanager.net domains will not be supported for creation or renewal. Action: Add a custom domain to your app and point the custom domain to your *.trafficmanager.net domain. After that, secure the custom domain with a new SSL certificate. If none of the above applies, no further action is required. How to Identify Impacted Resources? To assist with the upcoming changes, you can use Azure Resource Graph (ARG) queries to help identify resources that may be affected under each scenario. Please note that these queries are provided as a starting point and may not capture every configuration. Review your environment for any unique setups or custom configurations. Scenario 1: Sites Not Publicly Accessible This ARG query retrieves a list of sites that either have the public network access property disabled or are configured to use client certificates. It then filters for sites that are using App Service Managed Certificates (ASMC) for their custom hostname SSL bindings. These certificates are the ones that could be affected by the upcoming changes. However, please note that this query does not provide complete coverage, as there may be additional configurations impacting public access to your app that are not included here. Ultimately, this query serves as a helpful guide for users, but a thorough review of your environment is recommended. You can copy this query, paste it into Azure Resource Graph Explorer, and then click "Run query" to view the results for your environment. // ARG Query: Identify App Service sites that commonly restrict public access and use ASMC for custom hostname SSL bindings resources | where type == "microsoft.web/sites" // Extract relevant properties for public access and client certificate settings | extend publicNetworkAccess = tolower(tostring(properties.publicNetworkAccess)), clientCertEnabled = tolower(tostring(properties.clientCertEnabled)) // Filter for sites that either have public network access disabled // or have client certificates enabled (both can restrict public access) | where publicNetworkAccess == "disabled" or clientCertEnabled != "false" // Expand the list of SSL bindings for each site | mv-expand hostNameSslState = properties.hostNameSslStates | extend hostName = tostring(hostNameSslState.name), thumbprint = tostring(hostNameSslState.thumbprint) // Only consider custom domains (exclude default *.azurewebsites.net) and sites with an SSL certificate bound | where tolower(hostName) !endswith "azurewebsites.net" and isnotempty(thumbprint) // Select key site properties for output | project siteName = name, siteId = id, siteResourceGroup = resourceGroup, thumbprint, publicNetworkAccess, clientCertEnabled // Join with certificates to find only those using App Service Managed Certificates (ASMC) // ASMCs are identified by the presence of the "canonicalName" property | join kind=inner ( resources | where type == "microsoft.web/certificates" | extend certThumbprint = tostring(properties.thumbprint), canonicalName = tostring(properties.canonicalName) // Only ASMC uses the "canonicalName" property | where isnotempty(canonicalName) | project certName = name, certId = id, certResourceGroup = tostring(properties.resourceGroup), certExpiration = properties.expirationDate, certThumbprint, canonicalName ) on $left.thumbprint == $right.certThumbprint // Final output: sites with restricted public access and using ASMC for custom hostname SSL bindings | project siteName, siteId, siteResourceGroup, publicNetworkAccess, clientCertEnabled, thumbprint, certName, certId, certResourceGroup, certExpiration, canonicalName Scenario 2: Traffic Manager Endpoint Types For this scenario, please manually review your Traffic Manager profile configurations to ensure only “Azure Endpoints” are in use. We recommend inspecting your Traffic Manager profiles directly in the Azure portal or using relevant APIs to confirm your setup and ensure compliance with the new requirements. Scenario 3: Certificates Issued to *.trafficmanager.net Domains This ARG query helps you identify App Service Managed Certificates (ASMC) that were issued to *.trafficmanager.net domains. In addition, it also checks whether any web apps are currently using those certificates for custom domain SSL bindings. You can copy this query, paste it into Azure Resource Graph Explorer, and then click "Run query" to view the results for your environment. // ARG Query: Identify App Service Managed Certificates (ASMC) issued to *.trafficmanager.net domains // Also checks if any web apps are currently using those certificates for custom domain SSL bindings resources | where type == "microsoft.web/certificates" // Extract the certificate thumbprint and canonicalName (ASMCs have a canonicalName property) | extend certThumbprint = tostring(properties.thumbprint), canonicalName = tostring(properties.canonicalName) // Only ASMC uses the "canonicalName" property // Filter for certificates issued to *.trafficmanager.net domains | where canonicalName endswith "trafficmanager.net" // Select key certificate properties for output | project certName = name, certId = id, certResourceGroup = tostring(properties.resourceGroup), certExpiration = properties.expirationDate, certThumbprint, canonicalName // Join with web apps to see if any are using these certificates for SSL bindings | join kind=leftouter ( resources | where type == "microsoft.web/sites" // Expand the list of SSL bindings for each site | mv-expand hostNameSslState = properties.hostNameSslStates | extend hostName = tostring(hostNameSslState.name), thumbprint = tostring(hostNameSslState.thumbprint) // Only consider bindings for *.trafficmanager.net custom domains with a certificate bound | where tolower(hostName) endswith "trafficmanager.net" and isnotempty(thumbprint) // Select key site properties for output | project siteName = name, siteId = id, siteResourceGroup = resourceGroup, thumbprint ) on $left.certThumbprint == $right.thumbprint // Final output: ASMCs for *.trafficmanager.net domains and any web apps using them | project certName, certId, certResourceGroup, certExpiration, canonicalName, siteName, siteId, siteResourceGroup Ongoing Updates We will continue to update this post with any new queries or important changes as they become available. Be sure to check back for the latest information. Note on Comments We hope this information helps you navigate the upcoming changes. To keep this post clear and focused, comments are closed. If you have questions, need help, or want to share tips or alternative detection methods, please visit our official support channels or the Microsoft Q&A, where our team and the community can assist you.24KViews1like1CommentFollow-Up to ‘Important Changes to App Service Managed Certificates’: November 2025 Update
This post provides an update to the Tech Community article ‘Important Changes to App Service Managed Certificates: Is Your Certificate Affected?’ and covers the latest changes introduced since July 2025. With the November 2025 update, ASMC now remains supported even if the site is not publicly accessible, provided all other requirements are met. Details on requirements, exceptions, and validation steps are included below. Background Context to July 2025 Changes As of July 2025, all ASMC certificate issuance and renewals use HTTP token validation. Previously, public access was required because DigiCert needed to access the endpoint https://<hostname>/.well-known/pki-validation/fileauth.txt to verify the token before issuing the certificate. App Service automatically places this token during certificate creation and renewal. If DigiCert cannot access this endpoint, domain ownership validation fails, and the certificate cannot be issued. November 2025 Update Starting November 2025, App Service now allows DigiCert's requests to the https://<hostname>/.well-known/pki-validation/fileauth.txt endpoint, even if the site blocks public access. If there’s a request to create an App Service Managed Certificate (ASMC), App Service places the domain validation token at the validation endpoint. When DigiCert tries to reach the validation endpoint, App Service front ends present the token, and the request terminates at the front end layer. DigiCert's request does not reach the workers running the application. This behavior is now the default for ASMC issuance for initial certificate creation and renewals. Customers do not need to specifically allow DigiCert's IP addresses. Exceptions and Unsupported Scenarios This update addresses most scenarios that restrict public access, including App Service Authentication, disabling public access, IP restrictions, private endpoints, and client certificates. However, a public DNS record is still required. For example, sites using a private endpoint with a custom domain on a private DNS cannot validate domain ownership and obtain a certificate. Even with all validations now relying on HTTP token validation and DigiCert requests being allowed through, certain configurations are still not supported for ASMC: Sites configured as "Nested" or "External" endpoints behind Traffic Manager. Only "Azure" endpoints are supported. Certificates requested for domains ending in *.trafficmanager.net are not supported. Testing Customers can easily test whether their site’s configuration or set-up supports ASMC by attempting to create one for their site. If the initial request succeeds, renewals should also work, provided all requirements are met and the site is not listed in an unsupported scenario.6KViews1like0CommentsExpanding the Public Preview of the Azure SRE Agent
We are excited to share that the Azure SRE Agent is now available in public preview for everyone instantly – no sign up required. A big thank you to all our preview customers who provided feedback and helped shape this release! Watching teams put the SRE Agent to work taught us a ton, and we’ve baked those lessons into a smarter, more resilient, and enterprise-ready experience. You can now find Azure SRE Agent directly in the Azure Portal and get started, or use the link below. 📖 Learn more about SRE Agent. 👉 Create your first SRE Agent (Azure login required) What’s New in Azure SRE Agent - October Update The Azure SRE Agent now delivers secure-by-default governance, deeper diagnostics, and extensible automation—built for scale. It can even resolve incidents autonomously by following your team’s runbooks. With native integrations across Azure Monitor, GitHub, ServiceNow, and PagerDuty, it supports root cause analysis using both source code and historical patterns. And since September 1, billing and reporting are available via Azure Agent Units (AAUs). Please visit product documentation for the latest updates. Here are a few highlights for this month: Prioritizing enterprise governance and security: By default, the Azure SRE Agent operates with least-privilege access and never executes write actions on Azure resources without explicit human approval. Additionally, it uses role-based access control (RBAC) so organizations can assign read-only or approver roles, providing clear oversight and traceability from day one. This allows teams to choose their desired level of autonomy from read-only insights to approval-gated actions to full automation without compromising control. Covering the breadth and depth of Azure: The Azure SRE Agent helps teams manage and understand their entire Azure footprint. With built-in support for AZ CLI and kubectl, it works across all Azure services. But it doesn’t stop there—diagnostics are enhanced for platforms like PostgreSQL, API Management, Azure Functions, AKS, Azure Container Apps, and Azure App Service. Whether you're running microservices or managing monoliths, the agent delivers consistent automation and deep insights across your cloud environment. Automating Incident Management: The Azure SRE Agent now plugs directly into Azure Monitor, PagerDuty, and ServiceNow to streamline incident detection and resolution. These integrations let the Agent ingest alerts and trigger workflows that match your team’s existing tools—so you can respond faster, with less manual effort. Engineered for extensibility: The Azure SRE Agent incident management approach lets teams reuse existing runbooks and customize response plans to fit their unique workflows. Whether you want to keep a human in the loop or empower the Agent to autonomously mitigate and resolve issues, the choice is yours. This flexibility gives teams the freedom to evolve—from guided actions to trusted autonomy—without ever giving up control. Root cause, meet source code: The Azure SRE Agent now supports code-aware root cause analysis (RCA) by linking diagnostics directly to source context in GitHub and Azure DevOps. This tight integration helps teams trace incidents back to the exact code changes that triggered them—accelerating resolution and boosting confidence in automated responses. By bridging operational signals with engineering workflows, the agent makes RCA faster, clearer, and more actionable. Close the loop with DevOps: The Azure SRE Agent now generates incident summary reports directly in GitHub and Azure DevOps—complete with diagnostic context. These reports can be assigned to a GitHub Copilot coding agent, which automatically creates pull requests and merges validated fixes. Every incident becomes an actionable code change, driving permanent resolution instead of temporary mitigation. Getting Started Start here: Create a new SRE Agent in the Azure portal (Azure login required) Blog: Announcing a flexible, predictable billing model for Azure SRE Agent Blog: Enterprise-ready and extensible – Update on the Azure SRE Agent preview Product documentation Product home page Community & Support We’d love to hear from you! Please use our GitHub repo to file issues, request features, or share feedback with the team5.5KViews2likes3CommentsAnnouncing a flexible, predictable billing model for Azure SRE Agent
Billing for Azure SRE Agent will start on September 1, 2025. Announced at Microsoft Build 2025, Azure SRE Agent is a pre-built AI agent for root cause analysis, uptime improvement, and operational cost reduction. Learn more about the billing model and example scenarios.3.5KViews1like1CommentReimagining AI Ops with Azure SRE Agent: New Automation, Integration, and Extensibility features
Azure SRE Agent offers intelligent and context aware automation for IT operations. Enhanced by customer feedback from our preview, the SRE Agent has evolved into an extensible platform to automate and manage tasks across Azure and other environments. Built on an Agentic DevOps approach - drawing from proven practices in internal Azure operations - the Azure SRE Agent has already saved over 20,000 engineering hours across Microsoft product teams operations, delivering strong ROI for teams seeking sustainable AIOps. An Operations Agent that adapts to your playbooks Azure SRE Agent is an AI powered operations automation platform that empowers SREs, DevOps, IT operations, and support teams to automate tasks such as incident response, customer support, and developer operations from a single, extensible agent. Its value proposition and capabilities have evolved beyond diagnosis and mitigation of Azure issues, to automating operational workflows and seamless integration with the standards and processes used in your organization. SRE Agent is designed to automate operational work and reduce toil, enabling developers and operators to focus on high-value tasks. By streamlining repetitive and complex processes, SRE Agent accelerates innovation and improves reliability across cloud and hybrid environments. In this article, we will look at what’s new and what has changed since the last update. What’s New: Automation, Integration, and Extensibility Azure SRE Agent just got a major upgrade. From no-code automation to seamless integrations and expanded data connectivity, here’s what’s new in this release: No-code Sub-Agent Builder: Rapidly create custom automations without writing code. Flexible, event-driven triggers: Instantly respond to incidents and operational changes. Expanded data connectivity: Unify diagnostics and troubleshooting across more data sources. Custom actions: Integrate with your existing tools and orchestrate end-to-end workflows via MCP. Prebuilt operational scenarios: Accelerate deployment and improve reliability out of the box. Unlike generic agent platforms, Azure SRE Agent comes with deep integrations, prebuilt tools, and frameworks specifically for IT, DevOps, and SRE workflows. This means you can automate complex operational tasks faster and more reliably, tailored to your organization’s needs. Sub-Agent Builder: Custom Automation, No Code Required Empower teams to automate repetitive operational tasks without coding expertise, dramatically reducing manual workload and development cycles. This feature helps address the need for targeted automation, letting teams solve specific operational pain points without relying on one-size-fits-all solutions. Modular Sub-Agents: Easily create custom sub-agents tailored to your team’s needs. Each sub-agent can have its own instructions, triggers, and toolsets, letting you automate everything from outage response to customer email triage. Prebuilt System Tools: Eliminate the inefficiency of creating basic automation from scratch, and choose from a rich library of hundreds of built-in tools for Azure operations, code analysis, deployment management, diagnostics, and more. Custom Logic: Align automation to your unique business processes by defining your automation logic and prompts, teaching the agent to act exactly as your workflow requires. Flexible Triggers: Automate on Your Terms Invoke the agent to respond automatically to mission-critical events, not wait for manual commands. This feature helps speed up incident response and eliminate missed opportunities for efficiency. Multi-Source Triggers: Go beyond chat-based interactions, and trigger the agent to automatically respond to Incident Management and Ticketing systems like PagerDuty and ServiceNow, Observability Alerting systems like Azure Monitor Alerts, or even on a cron-based schedule for proactive monitoring and best-practices checks. Additional trigger sources such as GitHub issues, Azure DevOps pipelines, email, etc. will be added over time. This means automation can start exactly when and where you need it. Event-Driven Operations: Integrate with your CI/CD, monitoring, or support systems to launch automations in response to real-world events - like deployments, incidents, or customer requests. Vital for reducing downtime, it ensures that business-critical actions happen automatically and promptly. Expanded Data Connectivity: Unified Observability and Troubleshooting Integrate data, enabling comprehensive diagnostics and troubleshooting and faster, more informed decision-making by eliminating silos and speeding up issue resolution. Multiple Data Sources: The agent can now read data from Azure Monitor, Log Analytics, and Application Insights based on its Azure role-based access control (RBAC). Additional observability data sources such as Dynatrace, New Relic, Datadog, and more can be added via the Remote Model Context Protocol (MCP) servers for these tools. This gives you a unified view for diagnostics and automation. Knowledge Integration: Rather than manually detailing every instruction in your prompt, you can upload your Troubleshooting Guide (TSG) or Runbook directly, allowing the agent to automatically create an execution plan from the file. You may also connect the agent to resources like SharePoint, Jira, or documentation repositories through Remote MCP servers, enabling it to retrieve needed files on its own. This approach utilizes your organization’s existing knowledge base, streamlining onboarding and enhancing consistency in managing incidents. Azure SRE Agent is also building multi-agent collaboration by integrating with PagerDuty and Neubird, enabling advanced, cross-platform incident management and reliability across diverse environments. Custom Actions: Automate Anything, Anywhere Extend automation beyond Azure and integrate with any tool or workflow, solving the problem of limited automation scope and enabling end-to-end process orchestration. Out-of-the-Box Actions: Instantly automate common tasks like running azcli, kubectl, creating GitHub issues, or updating Azure resources, reducing setup time and operational overhead. Communication Notifications: The SRE Agent now features built-in connectors for Outlook, enabling automated email notifications, and for Microsoft Teams, allowing it to post messages directly to Teams channels for streamlined communication. Bring Your Own Actions: Drop in your own Remote MCP servers to extend the agent’s capabilities to any custom tool or workflow. Future-proof your agentic DevOps by automating proprietary or emerging processes with confidence. Prebuilt Operations Scenarios Address common operational challenges out of the box, saving teams time and effort while improving reliability and customer satisfaction. Incident Response: Minimize business impact and reduce operational risk by automating detection, diagnosis, and mitigation of your workload stack. The agent has built-in runbooks for common issues related to many Azure resource types including Azure Kubernetes Service (AKS), Azure Container Apps (ACA), Azure App Service, Azure Logic Apps, Azure Database for PostgreSQL, Azure CosmosDB, Azure VMs, etc. Support for additional resource types is being added continually, please see product documentation for the latest information. Root Cause Analysis & IaC Drift Detection: Instantly pinpoint incident causes with AI-driven root cause analysis including automated source code scanning via GitHub and Azure DevOps integration. Proactively detect and resolve infrastructure drift by comparing live cloud environments against source-controlled IaC, ensuring configuration consistency and compliance. Handle Complex Investigations: Enable the deep investigation mode that uses a hypothesis-driven method to analyze possible root causes. It collects logs and metrics, tests hypotheses with iterative checks, and documents findings. The process delivers a clear summary and actionable steps to help teams accurately resolve critical issues. Incident Analysis: The integrated dashboard offers a comprehensive overview of all incidents managed by the SRE Agent. It presents essential metrics, including the number of incidents reviewed, assisted, and mitigated by the agent, as well as those awaiting human intervention. Users can leverage aggregated visualizations and AI-generated root cause analyses to gain insights into incident processing, identify trends, enhance response strategies, and detect areas for improvement in incident management. Inbuilt Agent Memory: The new SRE Agent Memory System transforms incident response by institutionalizing the expertise of top SREs - capturing, indexing, and reusing critical knowledge from past incidents, investigations, and user guidance. Benefit from faster, more accurate troubleshooting, as the agent learns from both successes and mistakes, surfacing relevant insights, runbooks, and mitigation strategies exactly when needed. This system leverages advanced retrieval techniques and a domain-aware schema to ensure every on-call engagement is smarter than the last, reducing mean time to resolution (MTTR) and minimizing repeated toil. Automatically gain a continuously improving agent that remembers what works, avoids past pitfalls, and delivers actionable guidance tailored to the environment. GitHub Copilot and Azure DevOps Integration: Automatically triage, respond to, and resolve issues raised in GitHub or Azure DevOps. Integration with modern development platforms such as GitHub Copilot coding agent increases efficiency and ensures that issues are resolved faster, reducing bottlenecks in the development lifecycle. Ready to get started? Azure SRE Agent home page Product overview Pricing Page Pricing Calculator Pricing Blog Demo recordings Deployment samples What’s Next? Give us feedback: Your feedback is critical - You can Thumbs Up / Thumbs Down each interaction or thread, or go to the “Give Feedback” button in the agent to give us in-product feedback - or you can create issues or just share your thoughts in our GitHub repo at https://github.com/microsoft/sre-agent. We’re just getting started. In the coming months, expect even more prebuilt integrations, expanded data sources, and new automation scenarios. We anticipate continuous growth and improvement throughout our agentic AI platforms and services to effectively address customer needs and preferences. Let us know what Ops toil you want to automate next!2.4KViews0likes0CommentsProactive Monitoring Made Simple with Azure SRE Agent
SRE teams strive for proactive operations, catching issues before they impact customers and reducing the chaos of incident response. While perfection may be elusive, the real goal is minimizing outages and gaining immediate line of sight into production environments. Today, that’s harder than ever. It requires correlating countless signals and alerts, understanding how they relate—or don’t relate—to each other, and assigning the right sense of urgency and impact. Anything that shortens this cycle, accelerates detection, and enables automated remediation is what modern SRE teams crave. What if you could skip the scripting and pipelines? What if you could simply describe what you want in plain language and let it run automatically on a schedule? Scheduled Tasks for Azure SRE Agent With Scheduled Tasks for Azure SRE Agent, that what-if scenario is now a reality. Scheduled tasks combine natural language prompts with Azure SRE Agent’s automation capabilities, so you can express intent, set a schedule, and let the agent do the rest—without writing a single line of code. This means: ⚡ Faster incident response through early detection ✅ Better compliance with automated checks 🎯 More time for high-value engineering work and innovation 💡 The shift from reactive to proactive: Instead of waiting for alerts to fire or customers to report issues, you’re continuously monitoring, validating, and catching problems before they escalate. How Scheduled Tasks Work Under the Hood When you create a Scheduled Task, the process is more than just running a prompt on a timer. Here’s what happens: 1. Prompt Interpretation and Plan Creation The SRE Agent takes your natural language prompt—such as “Scan all resources for security best practices”—and converts it into a structured execution plan. This plan defines the steps, tools, and data sources required to fulfill your request. 2. Built-In Tools and MCP Integration The agent uses its built-in capabilities (Azure CLI, Log Analytics workspace, Appinsights) and can also leverage 3 rd party data sources or tools via MCP server integration for extended functionality. 3. Results Analysis and Smart Summarization After execution, the agent analyzes results, identifies anomalies or issues, and provides actionable summaries not just raw data dumps. 4. Notification and Escalation Based on findings, the agent can: Post updates to Teams channels Create or update incidents Send email notifications Trigger follow-up actions Real-World Use Cases for Proactive Ops Here’s where scheduled tasks shine for SRE teams: Use Case Prompt Example Schedule Security Posture Check “Scan all subscriptions for resources with public endpoints and flag any that shouldn’t be exposed” Daily Cost Anomaly Detection “Compare this week’s spend against last week and alert if any service exceeds 20% growth” Weekly Compliance Drift Detection “Check all storage accounts for encryption settings and report any non-compliant resources” Daily Resource Health Summary “Summarize the health status of all production VMs and highlight any degraded instances” Every 4 hours Incident Trend Analysis “Analyze ICM incidents from the past week, identify patterns, and summarize top contributing services” Weekly Getting Started in 3 Steps Step 1: Define Your Intent Write a natural language prompt describing what you want to monitor or check. Be specific about: - What resources or scope - What conditions to look for - What action to take if issues are found Example: > “Every morning at 8 AM, check all production Kubernetes clusters for pods in CrashLoopBackOff state. If any are found, post a summary to the #sre-alerts Teams channel with cluster name, namespace, and pod details.” Step 2: Set Your Schedule Choose how often the task should run: - Cron expressions for precise control - Simple intervals (hourly, daily, weekly) Step 3: Define Where to Receive Updates Include in your prompt where you want results delivered when the task finishes execution. The agent can use its built-in tools and connectors to: - Post summaries to a Teams channel - Send email notifications - Create or update ICM incidents Example prompt with notification: > "Check all production databases for long-running queries over 30 seconds. If any are found, post a summary to the #database-alerts Teams channel." Why This Matters for Proactive Operations Traditional monitoring approaches have limitations: Traditional Approach With Scheduled Tasks Write scripts, maintain pipelines Describe in plain language Static thresholds and rules Contextual, AI-powered analysis Alert fatigue from noisy signals Smart summarization of what matters Separate tools for check vs. action Unified detection and response Requires dedicated DevOps effort Any SRE can create and modify The result? Your team spends less time building and maintaining monitoring infrastructure and more time on the work that truly requires human expertise. Best Practices for Scheduled Tasks Start simple, iterate — Begin with one or two high-value checks and expand as you gain confidence Be specific in prompts — The more context you provide, the better the results Set appropriate frequencies — Not everything needs to run hourly; match the schedule to the risk Review and refine — Check task results periodically and adjust prompts for better accuracy What’s Next? Scheduled tasks are just the beginning. We’re continuing to invest in capabilities that help SRE teams shift left—catching issues earlier, automating routine checks, and freeing up time for strategic work. Ready to Start? Use this sample that shows how to create a scheduled health check sub-agent: https://github.com/microsoft/sre-agent/blob/main/samples/automation/samples/02-scheduled-health-check-sample.md This example demonstrates: - Building a HealthCheckAgent using built-in tools like Azure CLI and Log Analytics Workspace - Scheduling daily health checks for a container app at 7 AM - Sending email alerts when anomalies are detected 🔗 Explore more samples here: https://github.com/microsoft/sre-agent/tree/main/samples More to Learn Ignite 2025 announcements: https://aka.ms/ignite25/blog/sreagent Documentation: https://aka.ms/sreagent/docs Support & Feature Requests: https://github.com/microsoft/sre-agent/issues628Views0likes0Comments