azure functions
320 TopicsBuilding Reliable AI Travel Agents with the Durable Task Extension for Microsoft Agent Framework
The durable task extension for Microsoft Agent Framework makes all this possible. In this post, we'll walk through the AI Travel Planner, a C# application I built that demonstrates how to build reliable, scalable multi-agent applications using the durable task extension for Microsoft agent framework. While I work on the python version, I've included code snippets that show the python equivalent. If you haven't already seen the announcement post on the durable task extension for Microsoft Agent Framework, I suggest you read that first before continuing with this post: http://aka.ms/durable-extension-for-af-blog. In brief, production AI agents face real challenges: crashes can lose conversation history, unpredictable behavior makes debugging difficult, human-in-the-loop workflows require waiting without wasting resources, and variable demand needs flexible scaling. The durable task extension addresses each of these: Serverless Hosting: Deploy agents on Azure Functions with auto-scaling from thousands of instances to zero, while retaining full control in a serverless architecture. Automatic Session Management: Agents maintain persistent sessions with full conversation context that survives process crashes, restarts, and distributed execution across instances Deterministic Multi-Agent Orchestrations: Coordinate specialized durable agents with predictable, repeatable, code-driven execution patterns Human-in-the-Loop with Serverless Cost Savings: Pause for human input without consuming compute resources or incurring costs Built-in Observability with Durable Task Scheduler: Deep visibility into agent operations and orchestrations through the Durable Task Scheduler UI dashboard AI Travel Planner Architecture Overview The Travel Planner application takes user trip preferences and starts a workflow that orchestrates three specialized agent framework agents (a Destination Recommender, an Itinerary Planner, and a Local Recommender) to build a comprehensive, personalized travel plan. Once a travel plan is created, the workflow includes human-in-the-loop approval before booking the trip (mocked), showcasing how the durable task extension handles long-running operations easily: Application Workflow User Request: User submits travel preferences via React frontend Orchestration Scheduled: Azure Functions backend receives the request and schedules a deterministic agentic workflow using the Durable Task Extension for Agent Framework. Destination Recommendation: The orchestrator first coordinates the Destination Recommender agent to analyze preferences and suggest destinations Itinerary Planning and Local Recommendations: The orchestrator then parallelizes the invocation of the Itinerary Planner agent to create detailed day-by-day plans for the given destination and the Local Recommendations agent to add insider tips and attractions Storage: Created travel plan is saved to Azure Blob Storage Approval: User reviews and approves the plan (human-in-the-loop) Booking: Upon approval, booking of the trip completes Key Components Azure Static Web Apps: Hosts the React frontend Azure Functions (.NET 9): Serverless compute hosting the agents and workflow with automatic scaling Durable Task Extension for Microsoft Agent Framework: The AI agent SDK with durable task extension Durable Task Scheduler: Manages state persistence, orchestration, and observability Azure OpenAI (GPT-4o-mini): Powers the AI agents Now let’s dive into the code. Along the way, I’ll highlight the value the durable task extension brings and patterns you can apply to your own applications. Creating Durable Agents Making the standard Agent Framework agents durable agents is simple. Include the durable task extension package and register your agents within the ConfigureDurableAgents extension method and you automatically get: Persistent conversation sessions that survive restarts HTTP endpoints for agent interactions Automatic state checkpointing that survive restarts Distributed execution across instances C# FunctionsApplication .CreateBuilder(args) .ConfigureDurableAgents(configure => { configure.AddAIAgentFactory("DestinationRecommenderAgent", sp => chatClient.CreateAIAgent( instructions: "You are a travel destination expert...", name: "DestinationRecommenderAgent", services: sp)); configure.AddAIAgentFactory("ItineraryPlannerAgent", sp => chatClient.CreateAIAgent( instructions: "You are a travel itinerary planner...", name: "ItineraryPlannerAgent", services: sp, tools: [AIFunctionFactory.Create(CurrencyConverterTool.ConvertCurrency)])); configure.AddAIAgentFactory("LocalRecommendationsAgent", sp => chatClient.CreateAIAgent( instructions: "You are a local expert...", name: "LocalRecommendationsAgent", services: sp)); }); Python # Create the Azure OpenAI chat client chat_client = AzureOpenAIChatClient( endpoint=endpoint, deployment_name=deployment_name, credential=DefaultAzureCredential() ) # Destination Recommender Agent destination_recommender_agent = chat_client.create_agent( name="DestinationRecommenderAgent", instructions="You are a travel destination expert..." ) # Itinerary Planner Agent (with tools) itinerary_planner_agent = chat_client.create_agent( name="ItineraryPlannerAgent", instructions="You are a travel itinerary planner...", tools=[get_exchange_rate, convert_currency] ) # Local Recommendations Agent local_recommendations_agent = chat_client.create_agent( name="LocalRecommendationsAgent", instructions="You are a local expert..." ) # Configure Function App with Durable Agents. AgentFunctionApp is where the magic happens app = AgentFunctionApp(agents=[ destination_recommender_agent, itinerary_planner_agent, local_recommendations_agent ]) The Orchestration Programming Model The durable task extension uses an intuitive async/await programming model for deterministic orchestration. You write orchestration logic as ordinary imperative code (if/else, try/catch), and the framework handles all the complexity of coordination, durability, retries, and distributed execution. The Travel Planner Orchestration Here's the actual orchestration from the application that coordinates all three agents, runs tasks in parallel, handles human approval, and books the trip: C# [Function(nameof(RunTravelPlannerOrchestration))] public async Task<TravelPlanResult> RunTravelPlannerOrchestration( [OrchestrationTrigger] TaskOrchestrationContext context) { var travelRequest = context.GetInput<TravelRequest>()!; // Get durable agents and create conversation threads DurableAIAgent destinationAgent = context.GetAgent("DestinationRecommenderAgent"); DurableAIAgent itineraryAgent = context.GetAgent("ItineraryPlannerAgent"); DurableAIAgent localAgent = context.GetAgent("LocalRecommendationsAgent"); // Step 1: Get destination recommendations var destinations = await destinationAgent.RunAsync<DestinationRecommendations>( $"Recommend destinations for {travelRequest.Preferences}", destinationAgent.GetNewThread()); var topDestination = destinations.Result.Recommendations.First(); // Steps 2 & 3: Run itinerary and local recommendations IN PARALLEL var itineraryTask = itineraryAgent.RunAsync<TravelItinerary>( $"Create itinerary for {topDestination.Name}", itineraryAgent.GetNewThread()); var localTask = localAgent.RunAsync<LocalRecommendations>( $"Local recommendations for {topDestination.Name}", localAgent.GetNewThread()); await Task.WhenAll(itineraryTask, localTask); // Step 4: Save to blob storage await context.CallActivityAsync(nameof(SaveTravelPlanToBlob), travelPlan); // Step 5: Wait for human approval (NO COMPUTE COSTS while waiting!) var approval = await context.WaitForExternalEvent<ApprovalResponse>( "ApprovalEvent", TimeSpan.FromDays(7)); // Step 6: Book if approved if (approval.Approved) await context.CallActivityAsync(nameof(BookTrip), travelPlan); return new TravelPlanResult(travelPlan, approval.Approved); } Python app.orchestration_trigger(context_name="context") def travel_planner_orchestration(context: df.DurableOrchestrationContext): travel_request_data = context.get_input() travel_request = TravelRequest(**travel_request_data) # Get durable agents and create conversation threads destination_agent = app.get_agent(context, "DestinationRecommenderAgent") itinerary_agent = app.get_agent(context, "ItineraryPlannerAgent") local_agent = app.get_agent(context, "LocalRecommendationsAgent") # Step 1: Get destination recommendations destinations_result = yield destination_agent.run( messages=f"Recommend destinations for {travel_request.preferences}", thread=destination_agent.get_new_thread(), response_format=DestinationRecommendations ) destinations = cast(DestinationRecommendations, destinations_result.value) top_destination = destinations.recommendations[0] # Steps 2 & 3: Run itinerary and local recommendations IN PARALLEL itinerary_task = itinerary_agent.run( messages=f"Create itinerary for {top_destination.destination_name}", thread=itinerary_agent.get_new_thread(), response_format=Itinerary ) local_task = local_agent.run( messages=f"Local recommendations for {top_destination.destination_name}", thread=local_agent.get_new_thread(), response_format=LocalRecommendations ) results = yield context.task_all([itinerary_task, local_task]) itinerary = cast(Itinerary, results[0].value) local_recs = cast(LocalRecommendations, results[1].value) # Step 4: Save to blob storage yield context.call_activity("save_travel_plan_to_blob", travel_plan) # Step 5: Wait for human approval (NO COMPUTE COSTS while waiting!) approval_task = context.wait_for_external_event("ApprovalEvent") timeout_task = context.create_timer( context.current_utc_datetime + timedelta(days=7)) winner = yield context.task_any([approval_task, timeout_task]) if winner == approval_task: timeout_task.cancel() approval = approval_task.result # Step 6: Book if approved if approval.get("approved"): yield context.call_activity("book_trip", travel_plan) return TravelPlanResult(plan=travel_plan, approved=approval.get("approved")) return TravelPlanResult(plan=travel_plan, approved=False) Notice how the orchestration combines: Agent calls (await agent.RunAsync(...)) for AI-driven decisions Parallel execution (Task.WhenAll) for running multiple agents concurrently Activity calls (await context.CallActivityAsync(...)) for non-intelligent business tasks Human-in-the-loop (await context.WaitForExternalEvent(...)) for approval workflows The orchestration automatically checkpoints after each step. If a failure occurs, completed steps aren't re-executed. The orchestration resumes exactly where it left off, no need for manual intervention. Agent Patterns in Action Agent Chaining: Sequential Handoffs The Travel Planner demonstrates agent chaining where the Destination Recommender's output feeds into both the Itinerary Planner and Local Recommendations agents: C# // Agent 1: Get destination recommendations var destinations = await destinationAgent.RunAsync<DestinationRecommendations>(prompt, thread); var topDestination = destinations.Result.Recommendations.First(); // Agent 2: Create itinerary based on Agent 1's output var itinerary = await itineraryAgent.RunAsync<TravelItinerary>( $"Create itinerary for {topDestination.Name}", thread); Python # Agent 1: Get destination recommendations destinations_result = yield destination_agent.run( messages=prompt, thread=thread, response_format=DestinationRecommendations ) destinations = cast(DestinationRecommendations, destinations_result.value) top_destination = destinations.recommendations[0] # Agent 2: Create itinerary based on Agent 1's output itinerary_result = yield itinerary_agent.run( messages=f"Create itinerary for {top_destination.destination_name}", thread=thread, response_format=Itinerary ) itinerary = cast(Itinerary, itinerary_result.value) Agent Parallelization: Concurrent Execution The app runs the Itinerary Planner and Local Recommendations agents in parallel to reduce latency: C# // Launch both agent calls simultaneously var itineraryTask = itineraryAgent.RunAsync<TravelItinerary>(itineraryPrompt, thread1); var localTask = localAgent.RunAsync<LocalRecommendations>(localPrompt, thread2); // Wait for both to complete await Task.WhenAll(itineraryTask, localTask); Python # Launch both agent calls simultaneously itinerary_task = itinerary_agent.run( messages=itinerary_prompt, thread=thread1, response_format=Itinerary ) local_task = local_agent.run( messages=local_prompt, thread=thread2, response_format=LocalRecommendations ) # Wait for both to complete results = yield context.task_all([itinerary_task, local_task]) itinerary = cast(Itinerary, results[0].value) local_recs = cast(LocalRecommendations, results[1].value) Human-in-the-Loop: Approval Workflows The Travel Planner includes a complete human-in-the-loop pattern. After generating the travel plan, the workflow pauses for user approval: C# // Send approval request notification await context.CallActivityAsync(nameof(RequestApproval), travelPlan); // Wait for approval - NO COMPUTE COSTS OR LLM TOKENS while waiting! var approval = await context.WaitForExternalEvent<ApprovalResponse>( "ApprovalEvent", TimeSpan.FromDays(7)); if (approval.Approved) await context.CallActivityAsync(nameof(BookTrip), travelPlan); Python # Send approval request notification await context.CallActivityAsync(nameof(RequestApproval), travelPlan); # Wait for approval - NO COMPUTE COSTS OR LLM TOKENS while waiting! var approval = await context.WaitForExternalEvent<ApprovalResponse>( "ApprovalEvent", TimeSpan.FromDays(7)); if (approval.Approved) await context.CallActivityAsync(nameof(BookTrip), travelPlan); The API endpoint to handle approval responses: C# [Function(nameof(HandleApprovalResponse))] public async Task HandleApprovalResponse( [HttpTrigger("post", Route = "approve/{instanceId}")] HttpRequestData req, string instanceId, [DurableClient] DurableTaskClient client) { var approval = await req.ReadFromJsonAsync<ApprovalResponse>(); await client.RaiseEventAsync(instanceId, "ApprovalEvent", approval); } Python app.function_name(name="ApproveTravelPlan") app.route(route="travel-planner/approve/{instance_id}", methods=["POST"]) app.durable_client_input(client_name="client") async def approve_travel_plan(req: func.HttpRequest, client) -> func.HttpResponse: instance_id = req.route_params.get("instance_id") approval = req.get_json() await client.raise_event(instance_id, "ApprovalEvent", approval) return func.HttpResponse( json.dumps({"message": "Approval processed"}), status_code=200, mimetype="application/json" ) The workflow generates a complete travel plan and waits up to 7 days for user approval. During this entire waiting period, since we're hosting this application on the Functions Flex Consumption plan, the app scales down and zero compute resources or LLM tokens are consumed. When the user approves (or the timeout expires), the app scales back up and the orchestration automatically resumes with full context intact. Real-Time Monitoring with the Durable Task Scheduler Since we're using the Durable Task Scheduler as the backend for our durable agents, we're provided with a built-in dashboard for monitoring our agents and orchestrations in real-time. Agent Thread Insights Conversation history: View complete conversation threads for each agent session, including all messages, tool calls, and agent decisions Task timing: Monitor how long specific tasks and agent interactions take to complete Orchestration Insights Multi-agent visualization: See the execution flow across multiple agents with visual representation of parallel executions and branching Real-time monitoring: Track active orchestrations, queued work items, and agent states Performance metrics: Monitor response times, token usage, and orchestration duration Debugging Capabilities View structured inputs and outputs for activities, agents, and tool calls Trace tool invocations and their outcomes Monitor external event handling for human-in-the-loop scenarios The dashboard enables you to understand exactly what your agents and workflows are doing, diagnose issues quickly, and optimize performance, all without adding custom logging to your code. Try The Travel Planner Application That’s it! That’s the gist of how the AI Travel Planner application is put together and some of the key components that it took to build it. I'd love for you to try the application out for yourself. It's fully instrumented with the Azure Developer CLI and Bicep, so you can deploy it to Azure with a few simple CLI commands. Click here to try the AI Travel Planner sample Python version coming soon! Demo Video Learn More Durable Task Extension Overview Durable Agent Features Durable Task Scheduler AI Travel Planner GitHub Repository234Views1like0CommentsCall Function App from Azure Data Factory with Managed Identity Authentication
Integrating Azure Function Apps into your Azure Data Factory (ADF) workflows is a common practice. To enhance security beyond the use of function API keys, leveraging managed identity authentication is strongly recommended. Given the fact that many existing guides were outdated with recent updates to Azure services, this article provides a comprehensive, up-to-date walkthrough on configuring managed identity in ADF to securely call Function Apps. The provided methods can also be adapted to other Azure services that need to call Function Apps with managed identity authentication. The high level process is: Enable Managed Identity on Data Factory Configure Microsoft Entra Sign-in on Azure Function App Configure Linked Service in Data Factory Assign Permissions to the Data Factory in Azure Function Step 1: Enable Managed Identity on Data Factory On the Data Factory’s portal, go to Managed Identities, and enable a system assigned managed identity. Step 2: Configure Microsoft Entra Sign-in on Azure Function App 1. Go to Function App portal and enable Authentication. Choose "Microsoft" as the identity provider. 2. Add an app registration to the app, it could be an existing one or you can choose to let the platform create a new app registration. 3. Next, allow the ADF as a client application to authenticate to the function app. This step is a new requirement from previous guides. If these settings are not correctly set, the 403 response will be returned: Add the Application ID of the ADF managed identity in Allowed client application and Object ID of the ADF managed identity in the Allowed identities. If the requests are only allowed from specific tenants, add the Tenant ID of the managed identity in the last box. 4. This part sets the response from function app for the unauthenticated requests. We should set the response as "HTTP 401 Unauthorized: recommended for APIs" as sign-in page is not feasible for API calls from ADF. 5. Then, click next and use the default permission option. 6. After everything is set, click "Add" to complete the configuration. Copy the generated App (client) id, as this is used in data factory to handle authorization. Step 3: Configure Linked Service in Data Factory 1. To use an Azure Function activity in a pipeline, follow the steps here: Create an Azure Function activity with UI 2. Then Edit or New a Azure Function Linked Service. 3. Change authentication method to System Assigned Managed Identity, and paste the copied client ID of function app identity provider from Step 2 into Resource ID. This step is necessary as authorization does not work without this. Step 4: Assign Permissions to the Data Factory in Azure Function 1. On the function app portal, go to Access control (IAM), and Add a new role assignment. 2. Assign reader role. 3. Assign the Data Factory’s Managed Identity to that role. After everything is set, test that the function app can be called from Azure Data Factory successfully. Reference: https://prodata.ie/2022/06/16/enabling-managed-identity-authentication-on-azure-functions-in-data-factory/ https://learn.microsoft.com/en-us/azure/data-factory/control-flow-azure-function-activity https://docs.azure.cn/en-us/app-service/overview-authentication-authorization1.6KViews0likes2CommentsIndustry-Wide Certificate Changes Impacting Azure App Service Certificates
Executive Summary In early 2026, industry-wide changes mandated by browser applications and the CA/B Forum will affect both how TLS certificates are issued as well as their validity period. The CA/B Forum is a vendor body that establishes standards for securing websites and online communications through SSL/TLS certificates. Azure App Service is aligning with these standards for both App Service Managed Certificates (ASMC, free, DigiCert-issued) and App Service Certificates (ASC, paid, GoDaddy-issued). Most customers will experience no disruption. Action is required only if you pin certificates or use them for client authentication (mTLS). Who Should Read This? App Service administrators Security and compliance teams Anyone responsible for certificate management or application security Quick Reference: What’s Changing & What To Do Topic ASMC (Managed, free) ASC (GoDaddy, paid) Required Action New Cert Chain New chain (no action unless pinned) New chain (no action unless pinned) Remove certificate pinning Client Auth EKU Not supported (no action unless cert is used for mTLS) Not supported (no action unless cert is used for mTLS) Transition from mTLS Validity No change (already compliant) Two overlapping certs issued for the full year None (automated) If you do not pin certificates or use them for mTLS, no action is required. Timeline of Key Dates Date Change Action Required Mid-Jan 2026 and after ASMC migrates to new chain ASMC stops supporting client auth EKU Remove certificate pinning if used Transition to alternative authentication if the certificate is used for mTLS Mar 2026 and after ASC validity shortened ASC migrates to new chain ASC stops supporting client auth EKU Remove certificate pinning if used Transition to alternative authentication if the certificate is used for mTLS Actions Checklist For All Users Review your use of App Service certificates. If you do not pin these certificates and do not use them for mTLS, no action is required. If You Pin Certificates (ASMC or ASC) Remove all certificate or chain pinning before their respective key change dates to avoid service disruption. See Best Practices: Certificate Pinning. If You Use Certificates for Client Authentication (mTLS) Switch to an alternative authentication method before their respective key change dates to avoid service disruption, as client authentication EKU will no longer be supported for these certificates. See Sunsetting the client authentication EKU from DigiCert public TLS certificates. See Set Up TLS Mutual Authentication - Azure App Service Details & Rationale Why Are These Changes Happening? These updates are required by major browser programs (e.g., Chrome) and apply to all public CAs. They are designed to enhance security and compliance across the industry. Azure App Service is automating updates to minimize customer impact. What’s Changing? New Certificate Chain Certificates will be issued from a new chain to maintain browser trust. Impact: Remove any certificate pinning to avoid disruption. Removal of Client Authentication EKU Newly issued certificates will not support client authentication EKU. This change aligns with Google Chrome’s root program requirements to enhance security. Impact: If you use these certificates for mTLS, transition to an alternate authentication method. Shortening of Certificate Validity Certificate validity is now limited to a maximum of 200 days. Impact: ASMC is already compliant; ASC will automatically issue two overlapping certificates to cover one year. No billing impact. Frequently Asked Questions (FAQs) Will I lose coverage due to shorter validity? No. For App Service Certificate, App Service will issue two certificates to span the full year you purchased. Is this unique to DigiCert and GoDaddy? No. This is an industry-wide change. Do these changes impact certificates from other CAs? Yes. These changes are an industry-wide change. We recommend you reach out to your certificates’ CA for more information. Do I need to act today? If you do not pin or use these certs for mTLS, no action is required. Glossary ASMC: App Service Managed Certificate (free, DigiCert-issued) ASC: App Service Certificate (paid, GoDaddy-issued) EKU: Extended Key Usage mTLS: Mutual TLS (client certificate authentication) CA/B Forum: Certification Authority/Browser Forum Additional Resources Changes to the Managed TLS Feature Set Up TLS Mutual Authentication Azure App Service Best Practices – Certificate pinning DigiCert Root and Intermediate CA Certificate Updates 2023 Sunsetting the client authentication EKU from DigiCert public TLS certificates Feedback & Support If you have questions or need help, please visit our official support channels or the Microsoft Q&A, where our team and the community can assist you.520Views1like0Comments[Design Pattern] Handling race conditions and state in serverless data pipelines
Hello community, I recently faced a tricky data engineering challenge involving a lot of Parquet files (about 2 million records) that needed to be ingested, transformed, and split into different entities. The hard part wasn't the volume, but the logic. We needed to generate globally unique, sequential IDs for specific columns while keeping the execution time under two hours. We were restricted to using only Azure Functions, ADF, and Storage. This created a conflict: we needed parallel processing to meet the time limit, but parallel processing usually breaks sequential ID generation due to race conditions on the counters. I documented the three architecture patterns we tested to solve this: Sequential processing with ADF (Safe, but failed the 2-hour time limit). 2. Parallel processing with external locking/e-tags on Table Storage (Too complex and we still hit issues with inserts). 3. A "Fan-Out/Fan-In" pattern using Azure Durable Functions and Durable Entities. We ended up going with Durable Entities. Since they act as stateful actors, they allowed us to handle the ID counter state sequentially in memory while the heavy lifting (transformation) ran in parallel. It solved the race condition issue without killing performance. I wrote a detailed breakdown of the logic and trade-offs here if anyone is interested in the implementation details: https://medium.com/@yahiachames/data-ingestion-pipeline-a-data-engineers-dilemma-and-azure-solutions-7c4b36f11351 I am curious if others have used Durable Entities for this kind of ETL work, or if you usually rely on an external database sequence to handle ID generation in serverless setups? Thanks, Chameseddine18Views0likes1CommentPantone’s Palette Generator enhances creative exploration with agentic AI on Azure
Color can be powerful. When creative professionals shape the mood and direction of their work, color plays a vital role because it provides context and cues for the end product or creation. For more than 60 years, creatives from all areas of design—including fashion, product, and digital—have turned to Pantone color guides to translate inspiration into precise, reproducible color choices. These guides offer a shared language for colors, as well as inspiration and communication across industries. Once rooted in physical tools, Pantone has evolved to meet the needs of modern creators through its trend forecasting, consulting services, and digital platform. Today, Pantone Connect and its multi-agent solution called the Pantone Palette Generator seamlessly bring color inspiration and accuracy into everyday design workflows (as well as the New York City mayoral race). Simply by typing in a prompt, designers can generate palettes in seconds. Available in Pantone Connect, the tool uses Azure services like Microsoft Foundry, Azure AI Search, and Azure Cosmos DB to serve up the company’s vast collection of trend and color research from the color experts at the Pantone Color Institute. reached in seconds instead of days. Now, with Microsoft Foundry, creatives can use agents to get instant color palettes and suggestions based on human insights and trend direction.” Turning Pantone’s color legacy into an AI offering The Palette Generator accelerates the process of researching colors and helps designers find inspiration or validate some of their ideas through trend-backed research. “Pantone wants to be where our customers are,” says Rohani Jotshi, Director of Software Engineering and Data at Pantone. “As workflows become increasingly digital, we wanted to give our customers a way to find inspiration while keeping the same level of accuracy and trust they expect from Pantone.” The Palette Generator taps into thousands of articles from Pantone’s Color Insider library, as well as trend guides and physical color books in a way that preserves the company’s color standards science while streamlining the creative process. Built entirely on Microsoft Foundry, the solution uses Azure AI Search for agentic retrieval-augmented generation (RAG) and Azure OpenAI in Foundry Models to reason over the data. It quickly serves up palette options in response to questions like “Show me soft pastels for an eco-friendly line of baby clothes” or “I want to see vibrant metallics for next spring.” Over the course of two months, the Pantone team built the initial proof of concept for the Palette Generator, using GitHub Copilot to streamline the process and save over 200 hours of work across multiple sprints. This allowed Pantone’s engineers to focus on improving prompt engineering, adding new agent capabilities, and refining orchestration logic rather than writing repetitive code. Building a multi-agent architecture that accelerates creativity The Pantone team worked with Microsoft to develop the multi-agent architecture, which is made up of three connected agents. Using Microsoft Agent Framework—an open source development kit for building AI orchestration systems—it was a straightforward process to bring the agents together into one workflow. “The Microsoft team recommended Microsoft Agent Framework and when we tried it, we saw how it was extremely fast and easy to create architectural patterns,” says Kristijan Risteski, Solutions Architect at Pantone. “With Microsoft Agent Framework, we can spin up a model in five lines of code to connect our agents.” When a user types in a question, they interact with an orchestrator agent that routes prompts and coordinates the more specialized agents. Behind the scenes an additional agent retrieves contextually relevant insights from Pantone’s proprietary Color Insider dataset. Using Azure AI Search with vectorized data indexing, this agent interprets the semantics of a user’s query rather than relying solely on keywords. A third agent then applies rules from color science to assemble a balanced palette. This agent ensures the output is a color combination that meets harmony, contrast, and accessibility standards. The result is a set of Pantone-curated colors that match the emotional and aesthetic tone of the request. “All of this happens in seconds,” says Risteski. To manage conversation flow and achieve long-term data persistence, Pantone uses Azure Cosmos DB, which stores user sessions, prompts, and results. The database not only enables designers to revisit past palette explorations but also provides Pantone with valuable usage intelligence to refine the system over time. “We use Azure Cosmos DB to track inputs and outputs,” says Risteski. “That data helps us fine-tune prompts, measure engagement, and plan how we’ll train future models.” Improving accuracy and performance with Azure AI Search With Azure AI Search, the Palette Generator can understand the nuance of color language. Instead of relying solely on keyword searches that might miss the complexity of words like “vibrant” or “muted,” Pantone’s team decided to use a vectorized index for more accurate palette results. Using the built-in vectorization capability of Azure AI Search, the team converted their color knowledge base—including text-based color psychology and trend articles—into numerical embeddings. “Overall, vector search gave us better results because it could understand the intent of the prompt, not just the words,“ says Risteski. “If someone types, ‘Show me colors that feel serene and oceanic,’ the system understands intent. It finds the right references across our color psychology and trend archives and delivers them instantly.” The team also found ways to reduce latency as they evolved their proof of concept. Initially, they encountered slow inference times and performance lags when retrieving search results. By switching from GPT-4.1 to GPT-5, latency improved. And using Azure AI Search to manage ranking and filtering results helped reduce the number of calls to the large language model (LLM). “With Azure, we just get the articles, put them in a bucket, and say ‘index it now,’ says Risteski. “It takes one or two minutes—and that’s it. The results are so much better than traditional search.” Moving from inspiration to palettes faster The Palette Generator has transformed how designers and color enthusiasts interact with Pantone’s expertise. What once took weeks of research and review can now be done in seconds. “Typically, if someone wanted to develop a palette for a product launch, it might take many months of research,” says Jotshi. “Now, they can type one sentence to describe their inspiration then immediately find Pantone-backed insight and options. Human curation will still be hugely important, but a strong set of starting options can significantly accelerate the palette development process.” Expanding the palette: The next phase for Pantone’s design agent Rapidly launching the Palette Generator in beta has redefined what the Pantone engineering team thought was possible. “We’re a small development team, but with Azure we built an enterprise-grade AI system in a matter of weeks,” says Risteski. “That’s a huge win for us.” Next up, the team plans to migrate the entire orchestration layer to Azure Functions, moving to a fully scalable, serverless deployment. This will allow Pantone to run its agents more efficiently, handle variable workloads automatically, and integrate seamlessly with other Azure products such as Microsoft Foundry and Azure Cosmos DB. At the same time, Pantone plans to expand its multi-agent system to include new specialized agents, including one focused on palette harmony and another focused on trend prediction.468Views1like0CommentsImportant Changes to App Service Managed Certificates: Is Your Certificate Affected?
Overview As part of an upcoming industry-wide change, DigiCert, the Certificate Authority (CA) for Azure App Service Managed Certificates (ASMC), is required to migrate to a new validation platform to meet multi-perspective issuance corroboration (MPIC) requirements. While most certificates will not be impacted by this change, certain site configurations and setups may prevent certificate issuance or renewal starting July 28, 2025. Update December 8, 2025 We’ve published an update in November about how App Service Managed Certificates can now be supported on sites that block public access. This reverses the limitation introduced in July 2025, as mentioned in this blog. Note: This blog post reflects a point-in-time update and will not be revised. For the latest and most accurate details on App Service Managed Certificates, please refer to official documentation or subsequent updates. Learn more about the November 2025 update here: Follow-Up to ‘Important Changes to App Service Managed Certificates’: November 2025 Update. August 5, 2025 We’ve published a Microsoft Learn documentation titled App Service Managed Certificate (ASMC) changes – July 28, 2025 that contains more in-depth mitigation guidance and a growing FAQ section to support the changes outlined in this blog post. While the blog currently contains the most complete overview, the documentation will soon be updated to reflect all blog content. Going forward, any new information or clarifications will be added to the documentation page, so we recommend bookmarking it for the latest guidance. What Will the Change Look Like? For most customers: No disruption. Certificate issuance and renewals will continue as expected for eligible site configurations. For impacted scenarios: Certificate requests will fail (no certificate issued) starting July 28, 2025, if your site configuration is not supported. Existing certificates will remain valid until their expiration (up to six months after last renewal). Impacted Scenarios You will be affected by this change if any of the following apply to your site configurations: Your site is not publicly accessible: Public accessibility to your app is required. If your app is only accessible privately (e.g., requiring a client certificate for access, disabling public network access, using private endpoints or IP restrictions), you will not be able to create or renew a managed certificate. Other site configurations or setup methods not explicitly listed here that restrict public access, such as firewalls, authentication gateways, or any custom access policies, can also impact eligibility for managed certificate issuance or renewal. Action: Ensure your app is accessible from the public internet. However, if you need to limit access to your app, then you must acquire your own SSL certificate and add it to your site. Your site uses Azure Traffic Manager "nested" or "external" endpoints: Only “Azure Endpoints” on Traffic Manager will be supported for certificate creation and renewal. “Nested endpoints” and “External endpoints” will not be supported. Action: Transition to using "Azure Endpoints". However, if you cannot, then you must obtain a different SSL certificate for your domain and add it to your site. Your site relies on *.trafficmanager.net domain: Certificates for *.trafficmanager.net domains will not be supported for creation or renewal. Action: Add a custom domain to your app and point the custom domain to your *.trafficmanager.net domain. After that, secure the custom domain with a new SSL certificate. If none of the above applies, no further action is required. How to Identify Impacted Resources? To assist with the upcoming changes, you can use Azure Resource Graph (ARG) queries to help identify resources that may be affected under each scenario. Please note that these queries are provided as a starting point and may not capture every configuration. Review your environment for any unique setups or custom configurations. Scenario 1: Sites Not Publicly Accessible This ARG query retrieves a list of sites that either have the public network access property disabled or are configured to use client certificates. It then filters for sites that are using App Service Managed Certificates (ASMC) for their custom hostname SSL bindings. These certificates are the ones that could be affected by the upcoming changes. However, please note that this query does not provide complete coverage, as there may be additional configurations impacting public access to your app that are not included here. Ultimately, this query serves as a helpful guide for users, but a thorough review of your environment is recommended. You can copy this query, paste it into Azure Resource Graph Explorer, and then click "Run query" to view the results for your environment. // ARG Query: Identify App Service sites that commonly restrict public access and use ASMC for custom hostname SSL bindings resources | where type == "microsoft.web/sites" // Extract relevant properties for public access and client certificate settings | extend publicNetworkAccess = tolower(tostring(properties.publicNetworkAccess)), clientCertEnabled = tolower(tostring(properties.clientCertEnabled)) // Filter for sites that either have public network access disabled // or have client certificates enabled (both can restrict public access) | where publicNetworkAccess == "disabled" or clientCertEnabled != "false" // Expand the list of SSL bindings for each site | mv-expand hostNameSslState = properties.hostNameSslStates | extend hostName = tostring(hostNameSslState.name), thumbprint = tostring(hostNameSslState.thumbprint) // Only consider custom domains (exclude default *.azurewebsites.net) and sites with an SSL certificate bound | where tolower(hostName) !endswith "azurewebsites.net" and isnotempty(thumbprint) // Select key site properties for output | project siteName = name, siteId = id, siteResourceGroup = resourceGroup, thumbprint, publicNetworkAccess, clientCertEnabled // Join with certificates to find only those using App Service Managed Certificates (ASMC) // ASMCs are identified by the presence of the "canonicalName" property | join kind=inner ( resources | where type == "microsoft.web/certificates" | extend certThumbprint = tostring(properties.thumbprint), canonicalName = tostring(properties.canonicalName) // Only ASMC uses the "canonicalName" property | where isnotempty(canonicalName) | project certName = name, certId = id, certResourceGroup = tostring(properties.resourceGroup), certExpiration = properties.expirationDate, certThumbprint, canonicalName ) on $left.thumbprint == $right.certThumbprint // Final output: sites with restricted public access and using ASMC for custom hostname SSL bindings | project siteName, siteId, siteResourceGroup, publicNetworkAccess, clientCertEnabled, thumbprint, certName, certId, certResourceGroup, certExpiration, canonicalName Scenario 2: Traffic Manager Endpoint Types For this scenario, please manually review your Traffic Manager profile configurations to ensure only “Azure Endpoints” are in use. We recommend inspecting your Traffic Manager profiles directly in the Azure portal or using relevant APIs to confirm your setup and ensure compliance with the new requirements. Scenario 3: Certificates Issued to *.trafficmanager.net Domains This ARG query helps you identify App Service Managed Certificates (ASMC) that were issued to *.trafficmanager.net domains. In addition, it also checks whether any web apps are currently using those certificates for custom domain SSL bindings. You can copy this query, paste it into Azure Resource Graph Explorer, and then click "Run query" to view the results for your environment. // ARG Query: Identify App Service Managed Certificates (ASMC) issued to *.trafficmanager.net domains // Also checks if any web apps are currently using those certificates for custom domain SSL bindings resources | where type == "microsoft.web/certificates" // Extract the certificate thumbprint and canonicalName (ASMCs have a canonicalName property) | extend certThumbprint = tostring(properties.thumbprint), canonicalName = tostring(properties.canonicalName) // Only ASMC uses the "canonicalName" property // Filter for certificates issued to *.trafficmanager.net domains | where canonicalName endswith "trafficmanager.net" // Select key certificate properties for output | project certName = name, certId = id, certResourceGroup = tostring(properties.resourceGroup), certExpiration = properties.expirationDate, certThumbprint, canonicalName // Join with web apps to see if any are using these certificates for SSL bindings | join kind=leftouter ( resources | where type == "microsoft.web/sites" // Expand the list of SSL bindings for each site | mv-expand hostNameSslState = properties.hostNameSslStates | extend hostName = tostring(hostNameSslState.name), thumbprint = tostring(hostNameSslState.thumbprint) // Only consider bindings for *.trafficmanager.net custom domains with a certificate bound | where tolower(hostName) endswith "trafficmanager.net" and isnotempty(thumbprint) // Select key site properties for output | project siteName = name, siteId = id, siteResourceGroup = resourceGroup, thumbprint ) on $left.certThumbprint == $right.thumbprint // Final output: ASMCs for *.trafficmanager.net domains and any web apps using them | project certName, certId, certResourceGroup, certExpiration, canonicalName, siteName, siteId, siteResourceGroup Ongoing Updates We will continue to update this post with any new queries or important changes as they become available. Be sure to check back for the latest information. Note on Comments We hope this information helps you navigate the upcoming changes. To keep this post clear and focused, comments are closed. If you have questions, need help, or want to share tips or alternative detection methods, please visit our official support channels or the Microsoft Q&A, where our team and the community can assist you.24KViews1like1CommentFollow-Up to ‘Important Changes to App Service Managed Certificates’: November 2025 Update
This post provides an update to the Tech Community article ‘Important Changes to App Service Managed Certificates: Is Your Certificate Affected?’ and covers the latest changes introduced since July 2025. With the November 2025 update, ASMC now remains supported even if the site is not publicly accessible, provided all other requirements are met. Details on requirements, exceptions, and validation steps are included below. Background Context to July 2025 Changes As of July 2025, all ASMC certificate issuance and renewals use HTTP token validation. Previously, public access was required because DigiCert needed to access the endpoint https://<hostname>/.well-known/pki-validation/fileauth.txt to verify the token before issuing the certificate. App Service automatically places this token during certificate creation and renewal. If DigiCert cannot access this endpoint, domain ownership validation fails, and the certificate cannot be issued. November 2025 Update Starting November 2025, App Service now allows DigiCert's requests to the https://<hostname>/.well-known/pki-validation/fileauth.txt endpoint, even if the site blocks public access. If there’s a request to create an App Service Managed Certificate (ASMC), App Service places the domain validation token at the validation endpoint. When DigiCert tries to reach the validation endpoint, App Service front ends present the token, and the request terminates at the front end layer. DigiCert's request does not reach the workers running the application. This behavior is now the default for ASMC issuance for initial certificate creation and renewals. Customers do not need to specifically allow DigiCert's IP addresses. Exceptions and Unsupported Scenarios This update addresses most scenarios that restrict public access, including App Service Authentication, disabling public access, IP restrictions, private endpoints, and client certificates. However, a public DNS record is still required. For example, sites using a private endpoint with a custom domain on a private DNS cannot validate domain ownership and obtain a certificate. Even with all validations now relying on HTTP token validation and DigiCert requests being allowed through, certain configurations are still not supported for ASMC: Sites configured as "Nested" or "External" endpoints behind Traffic Manager. Only "Azure" endpoints are supported. Certificates requested for domains ending in *.trafficmanager.net are not supported. Testing Customers can easily test whether their site’s configuration or set-up supports ASMC by attempting to create one for their site. If the initial request succeeds, renewals should also work, provided all requirements are met and the site is not listed in an unsupported scenario.5.8KViews1like0CommentsAzure API Management Your Auth Gateway For MCP Servers
The Model Context Protocol (MCP) is quickly becoming the standard for integrating Tools 🛠️ with Agents 🤖 and Azure API Management is at the fore-front, ready to support this open-source protocol 🚀. You may have already encountered discussions about MCP, so let's clarify some key concepts: Model Context Protocol (MCP) is a standardized way, (a protocol), for AI models to interact with external tools, (and either read data or perform actions) and to enrich context for ANY language models. AI Agents/Assistants are autonomous LLM-powered applications with the ability to use tools to connect to external services required to accomplish tasks on behalf of users. Tools are components made available to Agents allowing them to interact with external systems, perform computation, and take actions to achieve specific goals. Azure API Management: As a platform-as-a-service, API Management supports the complete API lifecycle, enabling organizations to create, publish, secure, and analyze APIs with built-in governance, security, analytics, and scalability. New Cool Kid in Town - MCP AI Agents are becoming widely adopted due to enhanced Large Language Model (LLM) capabilities. However, even the most advanced models face limitations due to their isolation from external data. Each new data source requires custom implementations to extract, prepare, and make data accessible for any model(s). - A lot of heavy lifting. Anthropic developed an open-source standard - the Model Context Protocol (MCP), to connect your agents to external data sources such as local data sources (databases or computer files) or remote services (systems available over the internet through e.g. APIs). MCP Hosts: LLM applications such as chat apps or AI assistant in your IDEs (like GitHub Copilot in VS Code) that need to access external capabilities MCP Clients: Protocol clients that maintain 1:1 connections with servers, inside the host application MCP Servers: Lightweight programs that each expose specific capabilities and provide context, tools, and prompts to clients MCP Protocol: Transport layer in the middle At its core, MCP follows a client-server architecture where a host application can connect to multiple servers. Whenever your MCP host or client needs a tool, it is going to connect to the MCP server. The MCP server will then connect to for example a database or an API. MCP hosts and servers will connect with each other through the MCP protocol. You can create your own custom MCP Servers that connect to your or organizational data sources. For a quick start, please visit our GitHub repository to learn how to build a remote MCP server using Azure Functions without authentication: https://aka.ms/mcp-remote Remote vs. Local MCP Servers The MCP standard supports two modes of operation: Remote MCP servers: MCP clients connect to MCP servers over the Internet, establishing a connection using HTTP and Server-Sent Events (SSE), and authorizing the MCP client access to resources on the user's account using OAuth. Local MCP servers: MCP clients connect to MCP servers on the same machine, using stdio as a local transport method. Azure API Management as the AI Auth Gateway Now that we have learned that MCP servers can connect to remote services through an API. The question now rises, how can we expose our remote MCP servers in a secure and scalable way? This is where Azure API Management comes in. A way that we can securely and safely expose tools as MCP servers. Azure API Management provides: Security: AI agents often need to access sensitive data. API Management as a remote MCP proxy safeguards organizational data through authentication and authorization. Scalability: As the number of LLM interactions and external tool integrations grows, API Management ensures the system can handle the load. Security remains to be a critical piece of building MCP servers, as agents will need to securely connect to protected endpoints (tools) to perform certain actions or read protected data. When building remote MCP servers, you need a way to allow users to login (Authenticate) and allow them to grant the MCP client access to resources on their account (Authorization). MCP - Current Authorization Challenges State: 4/10/2025 Recent changes in MCP authorization have sparked significant debate within the community. 🔍 𝗞𝗲𝘆 𝗖𝗵𝗮𝗹𝗹𝗲𝗻𝗴𝗲𝘀 with the Authorization Changes: The MCP server is now treated as both a resource server AND an authorization server. This dual role has fundamental implications for MCP server developers and runtime operations. 💡 𝗢𝘂𝗿 𝗦𝗼𝗹𝘂𝘁𝗶𝗼𝗻: To address these challenges, we recommend using 𝗔𝘇𝘂𝗿𝗲 𝗔𝗣𝗜 𝗠𝗮𝗻𝗮𝗴𝗲𝗺𝗲𝗻𝘁 as your authorization gateway for remote MCP servers. 🔗For an enterprise-ready solution, please check out our azd up sample repo to learn how to build a remote MCP server using Azure API Management as your authentication gateway: https://aka.ms/mcp-remote-apim-auth The Authorization Flow The workflow involves three core components: the MCP client, the APIM Gateway, and the MCP server, with Microsoft Entra managing authentication (AuthN) and authorization (AuthZ). Using the OAuth protocol, the client starts by calling the APIM Gateway, which redirects the user to Entra for login and consent. Once authenticated, Entra provides an access token to the Gateway, which then exchanges a code with the client to generate an MCP server token. This token allows the client to communicate securely with the server via the Gateway, ensuring user validation and scope verification. Finally, the MCP server establishes a session key for ongoing communication through a dedicated message endpoint. Diagram source: https://aka.ms/mcp-remote-apim-auth-diagram Conclusion Azure API Management (APIM) is an essential tool for enterprise customers looking to integrate AI models with external tools using the Model Context Protocol (MCP). In this blog, we've emphasized the simplicity of connecting AI agents to various data sources through MCP, streamlining previously complex implementations. Given the critical role of secure access to platforms and services for AI agents, APIM offers robust solutions for managing OAuth tokens and ensuring secure access to protected endpoints, making it an invaluable asset for enterprises, despite the challenges of authentication. API Management: An Enterprise Solution for Securing MCP Servers Azure API Management is an essential tool for enterprise customers looking to integrate AI models with external tools using the Model Context Protocol (MCP). It is designed to help you to securely expose your remote MCP servers. MCP servers are still very new, and as the technology evolves, API Management provides an enterprise-ready solution that will evolve with the latest technology. Stay tuned for further feature announcements soon! Acknowledgments This post and work was made possible thanks to the hard work and dedication of our incredible team. Special thanks to Pranami Jhawar, Julia Kasper, Julia Muiruri, Annaji Sharma Ganti Jack Pa, Chaoyi Yuan and Alex Vieira for their invaluable contributions. Additional Resources MCP Client Server integration with APIM as AI gateway Blog Post: https://aka.ms/remote-mcp-apim-auth-blog Sequence Diagram: https://aka.ms/mcp-remote-apim-auth-diagram APIM lab: https://aka.ms/ai-gateway-lab-mcp-client-auth Python: https://aka.ms/mcp-remote-apim-auth .NET: https://aka.ms/mcp-remote-apim-auth-dotnet On-Behalf-Of Authorization: https://aka.ms/mcp-obo-sample 3rd Party APIs – Backend Auth via Credential Manager: Blog Post: https://aka.ms/remote-mcp-apim-lab-blog APIM lab: https://aka.ms/ai-gateway-lab-mcp YouTube Video: https://aka.ms/ai-gateway-lab-demo20KViews12likes4CommentsExpanding the Public Preview of the Azure SRE Agent
We are excited to share that the Azure SRE Agent is now available in public preview for everyone instantly – no sign up required. A big thank you to all our preview customers who provided feedback and helped shape this release! Watching teams put the SRE Agent to work taught us a ton, and we’ve baked those lessons into a smarter, more resilient, and enterprise-ready experience. You can now find Azure SRE Agent directly in the Azure Portal and get started, or use the link below. 📖 Learn more about SRE Agent. 👉 Create your first SRE Agent (Azure login required) What’s New in Azure SRE Agent - October Update The Azure SRE Agent now delivers secure-by-default governance, deeper diagnostics, and extensible automation—built for scale. It can even resolve incidents autonomously by following your team’s runbooks. With native integrations across Azure Monitor, GitHub, ServiceNow, and PagerDuty, it supports root cause analysis using both source code and historical patterns. And since September 1, billing and reporting are available via Azure Agent Units (AAUs). Please visit product documentation for the latest updates. Here are a few highlights for this month: Prioritizing enterprise governance and security: By default, the Azure SRE Agent operates with least-privilege access and never executes write actions on Azure resources without explicit human approval. Additionally, it uses role-based access control (RBAC) so organizations can assign read-only or approver roles, providing clear oversight and traceability from day one. This allows teams to choose their desired level of autonomy from read-only insights to approval-gated actions to full automation without compromising control. Covering the breadth and depth of Azure: The Azure SRE Agent helps teams manage and understand their entire Azure footprint. With built-in support for AZ CLI and kubectl, it works across all Azure services. But it doesn’t stop there—diagnostics are enhanced for platforms like PostgreSQL, API Management, Azure Functions, AKS, Azure Container Apps, and Azure App Service. Whether you're running microservices or managing monoliths, the agent delivers consistent automation and deep insights across your cloud environment. Automating Incident Management: The Azure SRE Agent now plugs directly into Azure Monitor, PagerDuty, and ServiceNow to streamline incident detection and resolution. These integrations let the Agent ingest alerts and trigger workflows that match your team’s existing tools—so you can respond faster, with less manual effort. Engineered for extensibility: The Azure SRE Agent incident management approach lets teams reuse existing runbooks and customize response plans to fit their unique workflows. Whether you want to keep a human in the loop or empower the Agent to autonomously mitigate and resolve issues, the choice is yours. This flexibility gives teams the freedom to evolve—from guided actions to trusted autonomy—without ever giving up control. Root cause, meet source code: The Azure SRE Agent now supports code-aware root cause analysis (RCA) by linking diagnostics directly to source context in GitHub and Azure DevOps. This tight integration helps teams trace incidents back to the exact code changes that triggered them—accelerating resolution and boosting confidence in automated responses. By bridging operational signals with engineering workflows, the agent makes RCA faster, clearer, and more actionable. Close the loop with DevOps: The Azure SRE Agent now generates incident summary reports directly in GitHub and Azure DevOps—complete with diagnostic context. These reports can be assigned to a GitHub Copilot coding agent, which automatically creates pull requests and merges validated fixes. Every incident becomes an actionable code change, driving permanent resolution instead of temporary mitigation. Getting Started Start here: Create a new SRE Agent in the Azure portal (Azure login required) Blog: Announcing a flexible, predictable billing model for Azure SRE Agent Blog: Enterprise-ready and extensible – Update on the Azure SRE Agent preview Product documentation Product home page Community & Support We’d love to hear from you! Please use our GitHub repo to file issues, request features, or share feedback with the team5.5KViews2likes3CommentsAnnouncing a flexible, predictable billing model for Azure SRE Agent
Billing for Azure SRE Agent will start on September 1, 2025. Announced at Microsoft Build 2025, Azure SRE Agent is a pre-built AI agent for root cause analysis, uptime improvement, and operational cost reduction. Learn more about the billing model and example scenarios.3.5KViews1like1Comment