azure paas
188 TopicsWhy Does Azure App Service Return HTTP 404?
When an application deployed to Azure App Service suddenly starts returning HTTP 404 – Not Found, it can be confusing —especially when: The deployment completed successfully The App Service shows as Running No obvious errors appear in the portal This behaviour is more common than it appears and is often linked to routing, configuration, or platform : In this article, I’ll walk through real-world reasons why Azure App Service can return HTTP 404 errors, based on issues . The goal is to help you systematically isolate the root cause—whether it’s application-level, configuration-related, or platform-specific. What Does HTTP 404 Mean in Azure App Service? An HTTP 404 response from Azure App Service means: The incoming request successfully reached Azure App Service, but neither the platform nor the application could locate the requested resource. This distinction is important. Unlike connectivity or DNS issues, a 404 confirms that: DNS resolution worked The request hit the App Service front end The failure happened after request routing Incorrect Application URL or Route This is the most common cause of 404 errors. Typical scenarios Accessing the root URL (https://<app>.azurewebsites.net) for a Web API that exposes only API routes Missing route prefixes such as /api , /v1controller/action name segments Case sensitivity mismatches on Linux App Service Example https://myapp.azurewebsites.net Returns 404, but: https://myapp.azurewebsites.net/weatherforecast Works as expected. ✅ Tip: Always validate your routing locally and confirm the exact same path is being accessed in Azure. Application Appears Running, but Startup Failed Partially It is possible for an App Service to show Running even when the application failed to initialize fully. Common causes Missing or incorrect environment variables Invalid connection strings Exceptions thrown during Program.cs / Startup.cs Dependency initialization failures at startup In such scenarios, the app may start the host process but fail to register routes—resulting in 404 responses instead of 500 errors. ✅ Where to check Application logs Deployment logs Kudu → LogFiles Static Files Not Found or Not Being Served For applications hosting static content (HTML, JavaScript, images, JSON files), a 404 can occur even when files exist. Common reasons Files not deployed to the expected directory (wwor root, /home/site/wwwroot) Missing or unsupported MIME type configuration (commonly seen with .json) Static file middleware not enabled in ASP.NET Core applications ✅ Quick validation: Deploy a simple test.html to wwwroot and try accessing it directly. Windows vs Linux App Service Differences Behaviour can differ significantly between Windows App Service and Linux App Service. Common pitfalls on Linux Case-sensitive file paths (Index.html ≠ index.html) Missing or incorrect startup command Differences in request routing handled by Nginx ✅ Tip: If the app works on Windows App Service but fails on Linux, always recheck file casing and startup configuration first. Custom Domain and Networking Configuration Issues In some cases, requests reach the App Service but fail due to domain or network constraints. Possible causes Incorrect custom domain binding ✅ Isolation step: Always test using the default *.azurewebsites.net specific issues the issue is domain-specific. 6. Health Checks or Monitoring Probes Targeting Invalid Paths Seeing periodic 404 entries in logs—every few minutes—is often a sign of misconfigured probes. Typical scenarios App Service Health Check configured with a non-existent endpoint External monitoring tools probing /health or paths that do no exist ✅ Fix: Ensure the health check path maps to a valid endpoint implemented by the application. 7.Missing or Corrupted Deployment Artifacts Even when deployments report success, application files may not be where the runtime expects them. Commonly observed with Zip deployments WEBSITE_RUN_FROM_PACKAGE misconfigurations Partial or interrupted deployments ✅ Verify using Kudu: Browse /home/site/wwwroot and check files are present. Quick Troubleshooting Checklist If your Azure App Service is returning HTTP 404: Verify the exact URL and route Test hostingstart.html or a static file (for example, /hostingstart.html) Review startup and application logs Inspect deployed artifacts via Kudu Validate Windows vs Linux behaviour differences Review networking, authentication, and health check settings 8. Application Gateway infront of App Service If you have Application gateway infront of app service , please check the re-write rules so that the request is being sent to correct path. Final Thoughts HTTP 404 errors on Azure App Service are rarely random. In most cases, they point to: Routing mismatches Startup or configuration failures Platform-specific behavior differences By breaking the investigation into platform → configuration → application, you can systematically narrow down the root cause and resolve the issue. Happy debugging 🚀131Views1like0CommentsMicrosoft 365 multi-agent workflow with Microsoft Agent Framework
Learn how to design and run a multi‑agent workflow with Microsoft Agent Framework: from building a coordinated set of specialized agents and tools, to hosting and deploying them with Azure AI Foundry, and finally exposing the same workflow to users in Microsoft 365 (Teams or Copilot). This walkthrough demonstrates a practical end‑to‑end pattern for orchestrating agents, adding tools, and packaging the solution for real‑world applications.259Views0likes0CommentsManaging Multi‑Tenant Azure Resource with SRE Agent and Lighthouse
Azure SRE Agent is an AI‑powered reliability assistant that helps teams diagnose and resolve production issues faster while reducing operational toil. It analyzes logs, metrics, alerts, and deployment data to perform root cause analysis and recommend or execute mitigations with human approval. It’s capable of integrating with azure services across subscriptions and resource groups that you need to monitor and manage. Today’s enterprise customers live in a multi-tenant world, and there are multiple reasons to that due to acquisitions, complex corporate structures, managed service providers, or IT partners. Azure Lighthouse enables enterprise IT teams and managed service providers to manage resources across multiple azure tenants from a single control plane. In this demo I will walk you through how to set up Azure SRE agent to manage and monitor multi-tenant resources delegated through Azure Lighthouse. Navigate to the Azure SRE agent and select Create agent. Fill in the required details along with the deployment region and deploy the SRE agent. Once the deployment is complete, hit Set up your agent. Select the Azure resources you would like your agent to analyze like resource groups or subscriptions. This will land you to the popup window that allows you to select the subscriptions and resource groups that you would like SRE agent to monitor and manage. You can then select the subscriptions and resource groups under the same tenant that you want SRE agent to manage; Great, So far so good 👍 As a Managed Service Provider (MSP) you have multiple tenants that you are managing via Azure Lighthouse, and you need to have SRE agent access to those. So, to demo this will need to set up Azure Lighthouse with correct set of roles and configuration to delegate access to management subscription where the Centralized SRE agent is running. From Azure portal search Lighthouse. Navigate to the Lighthouse home page and select Manage your customers. On My customers Overview select Create ARM Template Provide a Name and Description. Select subscriptions on a Delegated scope. Select + Add authorization which will take you to Add authorization window. Select Principal type, I am selecting User for demo purposes. The pop-up window will allow Select users from the list. Select the checkbox next to the desired user who you want to delegate the subscription and hit Select Then select the Role that you would like to assign the user from the managing tenant to the delegated tenant and select add. You can add multiple roles by adding additional authorization to the selected user. This step is important to make sure the delegated tenant is assigned with the right role in order for SRE Agents to add it as Azure source. Azure SRE agent requires an Owner or User Administrator RBAC role to assign the subscription to the list of managed resources. If an appropriate role is not assigned, you will see an error when selecting the delegated subscriptions in SRE agent Managed resources. As per Lighthouse role support Owner role isn’t supported and User access Administrator role is supported, but only for limited purpose. Refer Azure Lighthouse documentation for additional information. If role is not defined correctly, you might see an error stating: 🛑Failed to add Role assignment “The 'delegatedRoleDefinitionIds' property is required when using certain roleDefinitionIds for authorization. To allow a principalId to assign roles to a managed identity in the customer tenant, set its roleDefinitionId to User Access Administrator. Download the ARM template and add specific Azure built-in roles that you want to grant in the delegatedRoleDefinitionIds property. You can include any supported Azure built-in role except for User Access Administrator or Owner. This example shows a principalId with User Access Administrator role that can assign two built in roles to managed identities in the customer tenant: Contributor and Log Analytics Contributor. { "principalId": "00000000-0000-0000-0000-000000000000", "principalIdDisplayName": "Policy Automation Account", "roleDefinitionId": "18d7d88d-d35e-4fb5-a5c3-7773c20a72d9", "delegatedRoleDefinitionIds": [ "b24988ac-6180-42a0-ab88-20f7382dd24c", "92aaf0da-9dab-42b6-94a3-d43ce8d16293" ] } In addition SRE agent would require certain roles at the managed identity level in order to access and operate on those services. Locate SRE agent User assigned managed identity and add roles to the service principal. For the demo purpose I am assigning Reader, Monitoring Reader, and Log Analytics Reader role. Here is the sample ARM template used for this demo. { "$schema": "https://schema.management.azure.com/schemas/2019-08-01/subscriptionDeploymentTemplate.json#", "contentVersion": "1.0.0.0", "parameters": { "mspOfferName": { "type": "string", "metadata": { "description": "Specify a unique name for your offer" }, "defaultValue": "lighthouse-sre-demo" }, "mspOfferDescription": { "type": "string", "metadata": { "description": "Name of the Managed Service Provider offering" }, "defaultValue": "lighthouse-sre-demo" } }, "variables": { "mspRegistrationName": "[guid(parameters('mspOfferName'))]", "mspAssignmentName": "[guid(parameters('mspOfferName'))]", "managedByTenantId": "6e03bca1-4300-400d-9e80-000000000000", "authorizations": [ { "principalId": "504adfc5-da83-47d4-8709-000000000000", "roleDefinitionId": "e40ec5ca-96e0-45a2-b4ff-59039f2c2b59", "principalIdDisplayName": "Pranab Mandal" }, { "principalId": "504adfc5-da83-47d4-8709-000000000000", "roleDefinitionId": "18d7d88d-d35e-4fb5-a5c3-7773c20a72d9", "delegatedRoleDefinitionIds": [ "b24988ac-6180-42a0-ab88-20f7382dd24c", "92aaf0da-9dab-42b6-94a3-d43ce8d16293" ], "principalIdDisplayName": "Pranab Mandal" }, { "principalId": "504adfc5-da83-47d4-8709-000000000000", "roleDefinitionId": "b24988ac-6180-42a0-ab88-20f7382dd24c", "principalIdDisplayName": "Pranab Mandal" }, { "principalId": "0374ff5c-5272-49fa-878a-000000000000", "roleDefinitionId": "acdd72a7-3385-48ef-bd42-f606fba81ae7", "principalIdDisplayName": "sre-agent-ext-sub1-4n4y4v5jjdtuu" }, { "principalId": "0374ff5c-5272-49fa-878a-000000000000", "roleDefinitionId": "43d0d8ad-25c7-4714-9337-8ba259a9fe05", "principalIdDisplayName": "sre-agent-ext-sub1-4n4y4v5jjdtuu" }, { "principalId": "0374ff5c-5272-49fa-878a-000000000000", "roleDefinitionId": "73c42c96-874c-492b-b04d-ab87d138a893", "principalIdDisplayName": "sre-agent-ext-sub1-4n4y4v5jjdtuu" } ] }, "resources": [ { "type": "Microsoft.ManagedServices/registrationDefinitions", "apiVersion": "2022-10-01", "name": "[variables('mspRegistrationName')]", "properties": { "registrationDefinitionName": "[parameters('mspOfferName')]", "description": "[parameters('mspOfferDescription')]", "managedByTenantId": "[variables('managedByTenantId')]", "authorizations": "[variables('authorizations')]" } }, { "type": "Microsoft.ManagedServices/registrationAssignments", "apiVersion": "2022-10-01", "name": "[variables('mspAssignmentName')]", "dependsOn": [ "[resourceId('Microsoft.ManagedServices/registrationDefinitions/', variables('mspRegistrationName'))]" ], "properties": { "registrationDefinitionId": "[resourceId('Microsoft.ManagedServices/registrationDefinitions/', variables('mspRegistrationName'))]" } } ], "outputs": { "mspOfferName": { "type": "string", "value": "[concat('Managed by', ' ', parameters('mspOfferName'))]" }, "authorizations": { "type": "array", "value": "[variables('authorizations')]" } } } Login to the customers tenant and navigate to the service provides from the Azure Portal. From the Service Providers overview screen, select Service provider offers from the left navigation pane. From the top menu, select the Add offer drop down and select Add via template. In the Upload Offer Template window drag and drop or upload the template file that was created in the earlier step and hit Upload. Once the file is uploaded, select Review + Create. This will take a few minutes to deploy the template, and a successful deployment page should be displayed. Navigate to Delegations from Lighthouse overview and validate if you see the delegated subscription and the assigned role. Once the Lighthouse delegation is set up sign in to the managing tenant and navigate to the deployed SRE agent. Navigate to Azure resources from top menu or via Settings > Managed resources. Navigate to Add subscriptions to select customers subscriptions that you need SRE agent to manage. Adding subscription will automatically add required permission for the agent. Once the appropriate roles are added, the subscriptions are ready for the agent to manage and monitor resources within them. Summary - Benefits This blog post demonstrates how Azure SRE Agent can be used to centrally monitor and manage Azure resources across multiple tenants by integrating it with Azure Lighthouse, a common requirement for enterprises and managed service providers operating in complex, multi-tenant environments. It walks through: Centralized SRE operations across multiple Azure tenants Secure, role-based access using delegated resource management Reduced operational overhead for MSPs and enterprise IT teams Unified visibility into resource health and reliability across customer environments338Views1like0CommentsUsing an AI Agent to Troubleshoot and Fix Azure Function App Issues
TOC Preparation Troubleshooting Workflow Conclusion Preparation Topic: Required tools AI agent: for example, Copilot CLI / OpenCode / Hermes / OpenClaw, etc. In this example, we use Copilot CLI. Model access: for example, Anthropic Claude Opus. Relevant skills: this example does not use skills, but using relevant skills can speed up troubleshooting. Topic: Compliant with your organization Enterprise-level projects are sensitive, so you must confirm with the appropriate stakeholders before using them. Enterprise environments may also have strict standards for AI agent usage. Topic: Network limitations If the process involves restarting the Function App container or restarting related settings, communication between the user and the agent may be interrupted, and you will need to use /resume. If the agent needs internet access for investigation, the app must have outbound connectivity. If the Kudu container cannot be used because of network issues, this type of investigation cannot be carried out. Topic: Permission limitations If you are using Azure blessed images, according to the official documentation, the containers use the fixed password Docker!. However, if you are using a custom container, you will need to provide an additional login method. For resources the agent does not already have permission to investigate, you will need to enable SAMI and assign the appropriate RBAC roles. Troubleshooting Workflow Let’s use a classic case where an HTTP trigger cannot be tested from the Azure Portal. As you can see, when clicking Test/Run in the Azure Portal, an error message appears. At the same time, however, the home page does not show any abnormal status. At this point, we first obtain the Function App’s SAMI and assign it the Owner role for the entire resource group. This is only for demonstration purposes. In practice, you should follow the principle of least privilege and scope permissions down to only the specific resources and operations that are actually required. Next, go to the Kudu container, which is the always-on maintenance container dedicated to the app. Install and enable Copilot CLI. Then we can describe the problem we are encountering. After the agent processes the issue and interacts with you further, it can generate a reasonable investigation report. In this example, it appears that the Function App’s Storage Account access key had been rotated previously, but the Function App had not updated the corresponding environment variable. Once we understand the issue, we could perform the follow-up actions ourselves. However, to demonstrate the agent’s capabilities, you can also allow it to fix the problem directly, provided that you have granted the corresponding permissions through SAMI. During the process, the container restart will disconnect the session, so you will need to return to the Kudu container and resume the previous session so it can continue. Finally, it will inform you that the issue has been fixed, and then you can validate the result. This is the validation result, and it looks like the repair was successful. Conclusion After each repair, we can even extract the experience from that case into a skill and store it in a Storage Account for future reuse. In this way, we can not only reduce the agent’s initial investigation time for similar issues, but also save tokens. This makes both time and cost management more efficient.279Views3likes0CommentsGemma 4 on Azure Container Apps Serverless GPU
Every prompt you send to a hosted AI service leaves your tenant. Your code, your architecture decisions, your proprietary logic — all of it crosses a network boundary you don't control. For teams building in regulated industries or handling sensitive IP, that's not a philosophical concern. It's a compliance blocker. What if you could spin up a fully private AI coding agent — running on your own GPU, in your own Azure subscription — with a single command? That's exactly what this template does. One azd up , 15 minutes, and you have Google's Gemma 4 running on Azure Container Apps serverless GPU with an OpenAI-compatible API, protected by auth, and ready to power OpenCode as your terminal-based coding agent. No data leaves your environment. No third-party model provider sees your code. Full control. Why Self-Hosted AI on ACA? Azure Container Apps serverless GPU gives you on-demand GPU compute without managing VMs, Kubernetes clusters, or GPU drivers. You get a container, a GPU, and an HTTPS endpoint — Azure handles the rest. Here's what makes this approach different from calling a hosted model API: Complete data privacy — your code and prompts never leave your Azure subscription. No PII exposure, no data leakage, no third-party processing. For teams navigating HIPAA, SOC 2, or internal IP policies, this is the simplest path to compliant AI-assisted development. Predictable costs — you pay for GPU compute time, not per-token. Run as many prompts as you want against your deployed model. No rate limits — the GPU is yours. No throttling, no queue, no waiting for capacity. Model flexibility — swap models in minutes. Start with the 4B parameter Gemma 4 for fast iteration, scale up to 26B for complex reasoning tasks. This isn't a tradeoff between convenience and privacy. ACA serverless GPU makes self-hosted AI as easy to deploy as any SaaS endpoint — but the data stays yours. What You're Building The template deploys two containers into an Azure Container Apps environment: Ollama + Gemma 4 — running on a serverless GPU (NVIDIA T4 or A100), serving an OpenAI-compatible API Nginx auth proxy — a lightweight reverse proxy that adds basic authentication and exposes the endpoint over HTTPS The Ollama container pulls the Gemma 4 model on first start, so there's nothing to pre-build or upload. The nginx proxy runs on the free Consumption profile — only the Ollama container needs GPU. After deployment, you get a single HTTPS endpoint that works with curl , any OpenAI-compatible SDK, or OpenCode — a terminal-based AI coding agent that turns the whole thing into a private GitHub Copilot alternative. Step 1: Deploy with azd up You need the Azure CLI and Azure Developer CLI (azd) installed. git clone https://github.com/simonjj/gemma4-on-aca.git cd gemma4-on-aca azd up The setup walks you through three choices: GPU selection — T4 (16 GB VRAM) for smaller models, or A100 (80 GB VRAM) for the full Gemma 4 lineup. Model selection — depends on your GPU choice. The defaults are tuned for the best quality-to-speed ratio on each GPU tier. Proxy password — protects your endpoint with basic auth. Region availability: Serverless GPUs are available in various regoins such as australiaeast , brazilsouth , canadacentral , eastus , italynorth , swedencentral , uksouth , westus , and westus3 . Pick one of these when prompted for location. That's it. Provisioning takes about 10 minutes — mostly waiting for the ACA environment to create and the model to download. Choose Your Model Gemma 4 ships in four sizes. The right choice depends on your GPU and workload: Model Params Architecture Context Modalities Disk Size gemma4:e2b ~2B Dense 128K Text, Image, Audio ~7 GB gemma4:e4b ~4B Dense 128K Text, Image, Audio ~10 GB gemma4:26b 26B MoE (4B active) 256K Text, Image ~18 GB gemma4:31b 31B Dense 256K Text, Image ~20 GB Real-World Performance on ACA We benchmarked every model on both GPU tiers using Ollama v0.20 with Q4_K_M quantization and 32K context in Sweden Central: Model GPU Tokens/sec TTFT Notes gemma4:e2b T4 ~81 ~15ms Fastest on T4 gemma4:e4b T4 ~51 ~17ms Default T4 choice — best quality/speed gemma4:e2b A100 ~184 ~9ms Ultra-fast gemma4:e4b A100 ~129 ~12ms Great for lighter workloads gemma4:26b A100 ~113 ~14ms Default A100 choice — strong reasoning gemma4:31b A100 ~40 ~30ms Highest quality, slower 51 tokens/second on a T4 with the 4B model is fast enough for interactive coding assistance. The 26B model on A100 delivers 113 tokens/second with noticeably better reasoning — ideal for complex refactoring, architecture questions, and multi-file changes. The 26B and 31B models require A100 — they don't fit in T4's 16 GB VRAM. Step 2: Verify Your Endpoint After azd up completes, the post-provision hook prints your endpoint URL. Test it: curl -u admin:<YOUR_PASSWORD> \ https://<YOUR_PROXY_ENDPOINT>/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "gemma4:e4b", "messages": [{"role": "user", "content": "Hello!"}] }' You should get a JSON response with Gemma 4's reply. The endpoint is fully OpenAI-compatible — it works with any tool or SDK that speaks the OpenAI API format. Step 3: Connect OpenCode Here's where it gets powerful. OpenCode is a terminal-based AI coding agent — think GitHub Copilot, but running in your terminal and pointing at whatever model backend you choose. The azd up post-provision hook automatically generates an opencode.json in your project directory with the correct endpoint and credentials. If you need to create it manually: { "$schema": "https://opencode.ai/config.json", "provider": { "gemma4-aca": { "npm": "@ai-sdk/openai-compatible", "name": "Gemma 4 on ACA", "options": { "baseURL": "https://<YOUR_PROXY_ENDPOINT>/v1", "headers": { "Authorization": "Basic <BASE64_OF_admin:YOUR_PASSWORD>" } }, "models": { "gemma4:e4b": { "name": "Gemma 4 e4b (4B)" } } } } } Generate the Base64 value: echo -n "admin:YOUR_PASSWORD" | base64 Now run it: opencode run -m "gemma4-aca/gemma4:e4b" "Write a binary search in Rust" That command sends your prompt to Gemma 4 running on your ACA GPU, and streams the response back to your terminal. Every token is generated on your infrastructure. Nothing leaves your subscription. For interactive sessions, launch the TUI: opencode Select your model with /models , pick Gemma 4, and start coding. OpenCode supports file editing, code generation, refactoring, and multi-turn conversations — all powered by your private Gemma 4 instance. The Privacy Case This matters most for teams that can't send code to external APIs: HIPAA-regulated healthcare apps — patient data in code, schema definitions, and test fixtures stays in your Azure subscription Financial services — proprietary trading algorithms and risk models never leave your network boundary Defense and government — classified or CUI-adjacent codebases get AI assistance without external data processing agreements Startups with sensitive IP — your secret sauce stays secret, even while you use AI to build faster With ACA serverless GPU, you're not running a VM or managing a Kubernetes cluster to get this privacy. It's a managed container with a GPU attached. Azure handles the infrastructure, you own the data boundary. Clean Up When you're done: azd down This tears down all Azure resources. Since ACA serverless GPU bills only while your containers are running, you can also scale to zero replicas to pause costs without destroying the environment. Get Started 📖 gemma4-on-aca on GitHub — clone it, run azd up , and you're live 🤖 OpenCode — the terminal AI agent that connects to your Gemma 4 endpoint 📌 Gemma 4 docs — model architecture and capabilities 📌 ACA serverless GPU — GPU regions and workload profile details519Views0likes0CommentsThe Agent that investigates itself
Azure SRE Agent handles tens of thousands of incident investigations each week for internal Microsoft services and external teams running it for their own systems. Last month, one of those incidents was about the agent itself. Our KV cache hit rate alert started firing. Cached token percentage was dropping across the fleet. We didn't open dashboards. We simply asked the agent. It spawned parallel subagents, searched logs, read through its own source code, and produced the analysis. First finding: Claude Haiku at 0% cache hits. The agent checked the input distribution and found that the average call was ~180 tokens, well below Anthropic’s 4,096-token minimum for Haiku prompt caching. Structurally, these requests could never be cached. They were false positives. The real regression was in Claude Opus: cache hit rate fell from ~70% to ~48% over a week. The agent correlated the drop against the deployment history and traced it to a single PR that restructured prompt ordering, breaking the common prefix that caching relies on. It submitted two fixes: one to exclude all uncacheable requests from the alert, and the other to restore prefix stability in the prompt pipeline. That investigation is how we develop now. We rarely start with dashboards or manual log queries. We start by asking the agent. Three months earlier, it could not have done any of this. The breakthrough was not building better playbooks. It was harness engineering: enabling the agent to discover context as the investigation unfolded. This post is about the architecture decisions that made it possible. Where we started In our last post, Context Engineering for Reliable AI Agents: Lessons from Building Azure SRE Agent, we described how moving to a single generalist agent unlocked more complex investigations. The resolution rates were climbing, and for many internal teams, the agent could now autonomously investigate and mitigate roughly 50% of incidents. We were moving in the right direction. But the scores weren't uniform, and when we dug into why, the pattern was uncomfortable. The high-performing scenarios shared a trait: they'd been built with heavy human scaffolding. They relied on custom response plans for specific incident types, hand-built subagents for known failure modes, and pre-written log queries exposed as opaque tools. We weren’t measuring the agent’s reasoning – we were measuring how much engineering had gone into the scenario beforehand. On anything new, the agent had nowhere to start. We found these gaps through manual review. Every week, engineers read through lower-scored investigation threads and pushed fixes: tighten a prompt, fix a tool schema, add a guardrail. Each fix was real. But we could only review fifty threads a week. The agent was handling ten thousand. We were debugging at human speed. The gap between those two numbers was where our blind spots lived. We needed an agent powerful enough to take this toil off us. An agent which could investigate itself. Dogfooding wasn't a philosophy - it was the only way to scale. The Inversion: Three bets The problem we faced was structural - and the KV cache investigation shows it clearly. The cache rate drop was visible in telemetry, but the cause was not. The agent had to correlate telemetry with deployment history, inspect the relevant code, and reason over the diff that broke prefix stability. We kept hitting the same gap in different forms: logs pointing in multiple directions, failure modes in uninstrumented paths, regressions that only made sense at the commit level. Telemetry showed symptoms, but not what actually changed. We'd been building the agent to reason over telemetry. We needed it to reason over the system itself. The instinct when agents fail is to restrict them: pre-write the queries, pre-fetch the context, pre-curate the tools. It feels like control. In practice, it creates a ceiling. The agent can only handle what engineers anticipated in advance. The answer is an agent that can discover what it needs as the investigation unfolds. In the KV cache incident, each step, from metric anomaly to deployment history to a specific diff, followed from what the previous step revealed. It was not a pre-scripted path. Navigating towards the right context with progressive discovery is key to creating deep agents which can handle novel scenarios. Three architectural decisions made this possible – and each one compounded on the last. Bet 1: The Filesystem as the Agent's World Our first bet was to give the agent a filesystem as its workspace instead of a custom API layer. Everything it reasons over – source code, runbooks, query schemas, past investigation notes – is exposed as files. It interacts with that world using read_file, grep, find, and shell. No SearchCodebase API. No RetrieveMemory endpoint. This is an old Unix idea: reduce heterogeneous resources to a single interface. Coding agents already work this way. It turns out the same pattern works for an SRE agent. Frontier models are trained on developer workflows: navigating repositories, grepping logs, patching files, running commands. The filesystem is not an abstraction layered on top of that prior. It matches it. When we materialized the agent’s world as a repo-like workspace, our human "Intent Met" score - whether the agent's investigation addressed the actual root cause as judged by the on-call engineer - rose from 45% to 75% on novel incidents. But interface design is only half the story. The other half is what you put inside it. Code Repositories: the highest-leverage context Teams had prewritten log queries because they did not trust the agent to generate correct ones. That distrust was justified. Models hallucinate table names, guess column schemas, and write queries against the wrong cluster. But the answer was not tighter restriction. It was better grounding. The repo is the schema. Everything else is derived from it. When the agent reads the code that produces the logs, query construction stops being guesswork. It knows the exact exceptions thrown, and the conditions under which each path executes. Stack traces start making sense, and logs become legible. But beyond query grounding, code access unlocked three new capabilities that telemetry alone could not provide: Ground truth over documentation. Docs drift and dashboards show symptoms. The code is what the service actually does. In practice, most investigations only made sense when logs were read alongside implementation. Point-in-time investigation. The agent checks out the exact commit at incident time, not current HEAD, so it can correlate the failure against the actual diffs. That's what cracked the KV cache investigation: a PR broke prefix stability, and the diff was the only place this was visible. Without commit history, you can't distinguish a code regression from external factors. Reasoning even where telemetry is absent. Some code paths are not well instrumented. The agent can still trace logic through source and explain behavior even when logs do not exist. This is especially valuable in novel failure modes – the ones most likely to be missed precisely because no one thought to instrument them. Memory as a filesystem, not a vector store Our first memory system used RAG over past session learnings. It had a circular dependency: a limited agent learned from limited sessions and produced limited knowledge. Garbage in, garbage out. But the deeper problem was retrieval. In SRE Context, embedding similarity is a weak proxy for relevance. “KV cache regression” and “prompt prefix instability” may be distant in embedding space yet still describe the same causal chain. We tried re-ranking, query expansion, and hybrid search. None fixed the core mismatch between semantic similarity and diagnostic relevance. We replaced RAG with structured Markdown files that the agent reads and writes through its standard tool interface. The model names each file semantically: overview.md for a service summary, team.md for ownership and escalation paths, logs.md for cluster access and query patterns, debugging.md for failure modes and prior learnings. Each carry just enough context to orient the agent, with links to deeper files when needed. The key design choice was to let the model navigate memory, not retrieve it through query matching. The agent starts from a structured entry point and follows the evidence toward what matters. RAG assumes you know the right query before you know what you need. File traversal lets relevance emerge as context accumulates. This removed chunking, overlap tuning, and re-ranking entirely. It also proved more accurate, because frontier models are better at following context than embeddings are at guessing relevance. As a side benefit, memory state can be snapshotted periodically. One problem remains unsolved: staleness. When two sessions write conflicting patterns to debugging.md, the model must reconcile them. When a service changes behavior, old entries can become misleading. We rely on timestamps and explicit deprecation notes, but we do not have a systemic solution yet. This is an active area of work, and anyone building memory at scale will run into it. The sandbox as epistemic boundary The filesystem also defines what the agent can see. If something is not in the sandbox, the agent cannot reason about it. We treat that as a feature, not a limitation. Security boundaries and epistemic boundaries are enforced by the same mechanism. Inside that boundary, the agent has full execution: arbitrary bash, python, jq, and package installs through pip or apt. That scope unlocks capabilities we never would have built as custom tools. It opens PRs with gh cli, like the prompt-ordering fix from KV cache incident. It pushes Grafana dashboards, like a cache-hit-rate dashboard we now track by model. It installs domain-specific CLI tools mid-investigation when needed. No bespoke integration required, just a shell. The recurring lesson was simple: a generally capable agent in the right execution environment outperforms a specialized agent with bespoke tooling. Custom tools accumulate maintenance costs. Shell commands compose for free. Bet 2: Context Layering Code access tells the agent what a service does. It does not tell the agent what it can access, which resources its tools are scoped to, or where an investigation should begin. This gap surfaced immediately. Users would ask "which team do you handle incidents for?" and the agent had no answer. Tools alone are not enough. An integration also needs ambient context so the model knows what exists, how it is configured, and when to use it. We fixed this with context hooks: structured context injected at prompt construction time to orient the agent before it takes action. Connectors - what can I access? A manifest of wired systems such as Log Analytics, Outlook, and Grafana, along with their configuration. Repositories - what does this system do? Serialized repo trees, plus files like AGENTS.md, Copilot.md, and CLAUDE.md with team-specific instructions. Knowledge map - what have I learned before? A two-tier memory index with a top-level file linking to deeper scenario-specific files, so the model can drill down only when needed. Azure resource topology - where do things live? A serialized map of relationships across subscriptions, resource groups, and regions, so investigations start in the right scope. Together, these context hooks turn a cold start into an informed one. That matters because a bad early choice does not just waste tokens. It sends the investigation down the wrong trajectory. A capable agent still needs to know what exists, what matters, and where to start. Bet 3: Frugal Context Management Layered context creates a new problem: budget. Serialized repo trees, resource topology, connector manifests, and a memory index fill context fast. Once the agent starts reading source files and logs, complex incidents hit context limits. We needed our context usage to be deliberately frugal. Tool result compression via the filesystem Large tool outputs are expensive because they consume context before the agent has extracted any value from them. In many cases, only a small slice or a derived summary of that output is actually useful. Our framework exposes these results as files to the agent. The agent can then use tools like grep, jq, or python to process them outside the model interface, so that only the final result enters context. The filesystem isn't just a capability abstraction - it's also a budget management primitive. Context Pruning and Auto Compact Long investigations accumulate dead weight. As hypotheses narrow, earlier context becomes noise. We handle this with two compaction strategies. Context Pruning runs mid-session. When context usage crosses a threshold, we trim or drop stale tool calls and outputs - keeping the window focused on what still matters. Auto-Compact kicks in when a session approaches its context limit. The framework summarizes findings and working hypotheses, then resumes from that summary. From the user's perspective, there's no visible limit. Long investigations just work. Parallel subagents The KV cache investigation required reasoning along two independent hypotheses: whether the alert definition was sound, and whether cache behavior had actually regressed. The agent spawned parallel subagents for each task, each operating in its own context window. Once both finished, it merged their conclusions. This pattern generalizes to any task with independent components. It speeds up the search, keeps intermediate work from consuming the main context window, and prevents one hypothesis from biasing another. The Feedback loop These architectural bets have enabled us to close the original scaling gap. Instead of debugging the agent at human speed, we could finally start using it to fix itself. As an example, we were hitting various LLM errors: timeouts, 429s (too many requests), failures in the middle of response streaming, 400s from code bugs that produced malformed payloads. These paper cuts would cause investigations to stall midway and some conversations broke entirely. So, we set up a daily monitoring task for these failures. The agent searches for the last 24 hours of errors, clusters the top hitters, traces each to its root cause in the codebase, and submits a PR. We review it manually before merging. Over two weeks, the errors were reduced by more than 80%. Over the last month, we have successfully used our agent across a wide range of scenarios: Analyzed our user churn rate and built dashboards we now review weekly. Correlated which builds needed the most hotfixes, surfacing flaky areas of the codebase. Ran security analysis and found vulnerabilities in the read path. Helped fill out parts of its own Responsible AI review, with strict human review. Handles customer-reported issues and LiveSite alerts end to end. Whenever it gets stuck, we talk to it and teach it, ask it to update its memory, and it doesn't fail that class of problem again. The title of this post is literal. The agent investigating itself is not a metaphor. It is a real workflow, driven by scheduled tasks, incident triggers, and direct conversations with users. What We Learned We spent months building scaffolding to compensate for what the agent could not do. The breakthrough was removing it. Every prewritten query was a place we told the model not to think. Every curated tool was a decision made on its behalf. Every pre-fetched context was a guess about what would matter before we understood the problem. The inversion was simple but hard to accept: stop pre-computing the answer space. Give the model a structured starting point, a filesystem it knows how to navigate, context hooks that tell it what it can access, and budget management that keeps it sharp through long investigations. The agent that investigates itself is both the proof and the product of this approach. It finds its own bugs, traces them to root causes in its own code, and submits its own fixes. Not because we designed it to. Because we designed it to reason over systems, and it happens to be one. We are still learning. Staleness is unsolved, budget tuning remains largely empirical, and we regularly discover assumptions baked into context that quietly constrain the agent. But we have crossed a new threshold: from an agent that follows your playbook to one that writes the next one. Thanks to visagarwal for co-authoring this post.13KViews6likes0CommentsMonitor AI Agents on App Service with OpenTelemetry and the New Application Insights Agents View
Part 2 of 3: In Blog 1, we deployed a multi-agent travel planner on Azure App Service using the Microsoft Agent Framework (MAF) 1.0 GA. This post dives deep into how we instrumented those agents with OpenTelemetry and lit up the brand-new Agents (Preview) view in Application Insights. 📋 Prerequisite: This post assumes you've followed the guidance in Blog 1 to deploy the multi-agent travel planner to Azure App Service. If you haven't deployed the app yet, start there first — you'll need a running App Service with the agents, Service Bus, Cosmos DB, and Azure OpenAI provisioned before the monitoring steps in this post will work. Deploying Agents Is Only Half the Battle In Blog 1, we walked through deploying a multi-agent travel planning application on Azure App Service. Six specialized agents — a Coordinator, Currency Converter, Weather Advisor, Local Knowledge Expert, Itinerary Planner, and Budget Optimizer — work together to generate comprehensive travel plans. The architecture uses an ASP.NET Core API backed by a WebJob for async processing, Azure Service Bus for messaging, and Azure OpenAI for the brains. But here's the thing: deploying agents to production is only half the battle. Once they're running, you need answers to questions like: Which agent is consuming the most tokens? How long does the Itinerary Planner take compared to the Weather Advisor? Is the Coordinator making too many LLM calls per workflow? When something goes wrong, which agent in the pipeline failed? Traditional APM gives you HTTP latencies and exception rates. That's table stakes. For AI agents, you need to see inside the agent — the model calls, the tool invocations, the token spend. And that's exactly what Application Insights' new Agents (Preview) view delivers, powered by OpenTelemetry and the GenAI semantic conventions. Let's break down how it all works. The Agents (Preview) View in Application Insights Azure Application Insights now includes a dedicated Agents (Preview) blade that provides unified monitoring purpose-built for AI agents. It's not just a generic dashboard — it understands agent concepts natively. Whether your agents are built with Microsoft Agent Framework, Azure AI Foundry, Copilot Studio, or a third-party framework, this view lights up as long as your telemetry follows the GenAI semantic conventions. Here's what you get out of the box: Agent dropdown filter — A dropdown populated by gen_ai.agent.name values from your telemetry. In our travel planner, this shows all six agents: "Travel Planning Coordinator", "Currency Conversion Specialist", "Weather & Packing Advisor", "Local Expert & Cultural Guide", "Itinerary Planning Expert", and "Budget Optimization Specialist". You can filter the entire dashboard to one agent or view them all. Token usage metrics — Visualizations of input and output token consumption, broken down by agent. Instantly see which agents are the most expensive to run. Operational metrics — Latency distributions, error rates, and throughput for each agent. Spot performance regressions before users notice. End-to-end transaction details — Click into any trace to see the full workflow: which agents were invoked, what tools they called, how long each step took. The "simple view" renders agent steps in a story-like format that's remarkably easy to follow. Grafana integration — One-click export to Azure Managed Grafana for custom dashboards and alerting. The key insight: this view isn't magic. It works because the telemetry is structured using well-defined semantic conventions. Let's look at those next. 📖 Docs: Application Insights Agents (Preview) view documentation GenAI Semantic Conventions — The Foundation The entire Agents view is powered by the OpenTelemetry GenAI semantic conventions. These are a standardized set of span attributes that describe AI agent behavior in a way that any observability backend can understand. Think of them as the "contract" between your instrumented code and Application Insights. Let's walk through the key attributes and why each one matters: gen_ai.agent.name This is the human-readable name of the agent. In our travel planner, each agent sets this via the name parameter when constructing the MAF ChatClientAgent — for example, "Weather & Packing Advisor" or "Budget Optimization Specialist" . This is what populates the agent dropdown in the Agents view. Without this attribute, Application Insights would have no way to distinguish one agent from another in your telemetry. It's the single most important attribute for agent-level monitoring. gen_ai.agent.description A brief description of what the agent does. Our Weather Advisor, for example, is described as "Provides weather forecasts, packing recommendations, and activity suggestions based on destination weather conditions." This metadata helps operators and on-call engineers quickly understand an agent's role without diving into source code. It shows up in trace details and helps contextualize what you're looking at when debugging. gen_ai.agent.id A unique identifier for the agent instance. In MAF, this is typically an auto-generated GUID. While gen_ai.agent.name is the human-friendly label, gen_ai.agent.id is the machine-stable identifier. If you rename an agent, the ID stays the same, which is important for tracking agent behavior across code deployments. gen_ai.operation.name The type of operation being performed. Values include "chat" for standard LLM calls and "execute_tool" for tool/function invocations. In our travel planner, when the Weather Advisor calls the GetWeatherForecast function via NWS, or when the Currency Converter calls ConvertCurrency via the Frankfurter API, those tool calls get their own spans with gen_ai.operation.name = "execute_tool" . This lets you measure LLM think-time separately from tool execution time — a critical distinction for performance optimization. gen_ai.request.model / gen_ai.response.model The model used for the request and the model that actually served the response (these can differ when providers do model routing). In our case, both are "gpt-4o" since that's what we deploy via Azure OpenAI. These attributes let you track model usage across agents, spot unexpected model assignments, and correlate performance changes with model updates. gen_ai.usage.input_tokens / gen_ai.usage.output_tokens Token consumption per LLM call. This is what powers the token usage visualizations in the Agents view. The Coordinator agent, which aggregates results from all five specialist agents, tends to have higher output token counts because it's synthesizing a full travel plan. The Currency Converter, which makes focused API calls, uses fewer tokens overall. These attributes let you answer the question "which agent is costing me the most?" — and more importantly, let you set alerts when token usage spikes unexpectedly. gen_ai.system The AI system or provider. In our case, this is "openai" (set by the Azure OpenAI client instrumentation). If you're using multiple AI providers — say, Azure OpenAI for planning and a local model for classification — this attribute lets you filter and compare. Together, these attributes create a rich, structured view of agent behavior that goes far beyond generic tracing. They're the reason Application Insights can render agent-specific dashboards with token breakdowns, latency distributions, and end-to-end workflow views. Without these conventions, all you'd see is opaque HTTP calls to an OpenAI endpoint. 💡 Key takeaway: The GenAI semantic conventions are what transform generic distributed traces into agent-aware observability. They're the bridge between your code and the Agents view. Any framework that emits these attributes — MAF, Semantic Kernel, LangChain — can light up this dashboard. Two Layers of OpenTelemetry Instrumentation Our travel planner sample instruments at two distinct levels, each capturing different aspects of agent behavior. Let's look at both. Layer 1: IChatClient-Level Instrumentation The first layer instruments at the IChatClient level using Microsoft.Extensions.AI . This is where we wrap the Azure OpenAI chat client with OpenTelemetry: var client = new AzureOpenAIClient(azureOpenAIEndpoint, new DefaultAzureCredential()); // Wrap with OpenTelemetry to emit GenAI semantic convention spans return client.GetChatClient(modelDeploymentName).AsIChatClient() .AsBuilder() .UseOpenTelemetry() .Build(); This single .UseOpenTelemetry() call intercepts every LLM call and emits spans with: gen_ai.system — the AI provider (e.g., "openai" ) gen_ai.request.model / gen_ai.response.model — which model was used gen_ai.usage.input_tokens / gen_ai.usage.output_tokens — token consumption per call gen_ai.operation.name — the operation type ( "chat" ) Think of this as the "LLM layer" — it captures what the model is doing regardless of which agent called it. It's model-centric telemetry. Layer 2: Agent-Level Instrumentation The second layer instruments at the agent level using MAF 1.0 GA's built-in OpenTelemetry support. This happens in the BaseAgent class that all our agents inherit from: Agent = new ChatClientAgent( chatClient, instructions: Instructions, name: AgentName, description: Description, tools: chatOptions.Tools?.ToList()) .AsBuilder() .UseOpenTelemetry(sourceName: AgentName) .Build(); The .UseOpenTelemetry(sourceName: AgentName) call on the MAF agent builder emits a different set of spans: gen_ai.agent.name — the human-readable agent name (e.g., "Weather & Packing Advisor" ) gen_ai.agent.description — what the agent does gen_ai.agent.id — the unique agent identifier Agent invocation traces — spans that represent the full lifecycle of an agent call This is the "agent layer" — it captures which agent is doing the work and provides the identity information that powers the Agents view dropdown and per-agent filtering. Why Both Layers? When both layers are active, you get the richest possible telemetry. The agent-level spans nest around the LLM-level spans, creating a trace hierarchy that looks like: Agent: "Weather & Packing Advisor" (gen_ai.agent.name) └── chat (gen_ai.operation.name) ├── model: gpt-4o, input_tokens: 450, output_tokens: 120 └── execute_tool: GetWeatherForecast └── chat (follow-up with tool results) └── model: gpt-4o, input_tokens: 680, output_tokens: 350 There is a tradeoff: with both layers active, you may see some span duplication since both the IChatClient wrapper and the MAF agent wrapper emit spans for the same underlying LLM call. If you find the telemetry too noisy, you can disable one layer: Agent layer only (remove .UseOpenTelemetry() from the IChatClient ) — You get agent identity but lose per-call token breakdowns. IChatClient layer only (remove .UseOpenTelemetry() from the agent builder) — You get detailed LLM metrics but lose agent identity in the Agents view. For the fullest experience with the Agents (Preview) view, we recommend keeping both layers active. The official sample uses both, and the Agents view is designed to handle the overlapping spans gracefully. 📖 Docs: MAF Observability Guide Exporting Telemetry to Application Insights Emitting OpenTelemetry spans is only useful if they land somewhere you can query them. The good news is that Azure App Service and Application Insights have deep native integration — App Service can auto-instrument your app, forward platform logs, and surface health metrics out of the box. For a full overview of monitoring capabilities, see Monitor Azure App Service. For our AI agent scenario, we go beyond the built-in platform telemetry. We need the GenAI semantic convention spans that we configured in the previous sections to flow into App Insights so the Agents (Preview) view can render them. Our travel planner has two host processes — the ASP.NET Core API and a WebJob — and each requires a slightly different exporter setup. ASP.NET Core API — Azure Monitor OpenTelemetry Distro For the API, it's a single line. The Azure Monitor OpenTelemetry Distro handles everything: // Configure OpenTelemetry with Azure Monitor for traces, metrics, and logs. // The APPLICATIONINSIGHTS_CONNECTION_STRING env var is auto-discovered. builder.Services.AddOpenTelemetry().UseAzureMonitor(); That's it. The distro automatically: Discovers the APPLICATIONINSIGHTS_CONNECTION_STRING environment variable Configures trace, metric, and log exporters to Application Insights Sets up appropriate sampling and batching Registers standard ASP.NET Core HTTP instrumentation This is the recommended approach for any ASP.NET Core application. One NuGet package ( Azure.Monitor.OpenTelemetry.AspNetCore ), one line of code, zero configuration files. WebJob — Manual Exporter Setup The WebJob is a non-ASP.NET Core host (it uses Host.CreateApplicationBuilder ), so the distro's convenience method isn't available. Instead, we configure the exporters explicitly: // Configure OpenTelemetry with Azure Monitor for the WebJob (non-ASP.NET Core host). // The APPLICATIONINSIGHTS_CONNECTION_STRING env var is auto-discovered. builder.Services.AddOpenTelemetry() .ConfigureResource(r => r.AddService("TravelPlanner.WebJob")) .WithTracing(t => t .AddSource("*") .AddAzureMonitorTraceExporter()) .WithMetrics(m => m .AddMeter("*") .AddAzureMonitorMetricExporter()); builder.Logging.AddOpenTelemetry(o => o.AddAzureMonitorLogExporter()); A few things to note: .AddSource("*") — Subscribes to all trace sources, including the ones emitted by MAF's .UseOpenTelemetry(sourceName: AgentName) . In production, you might narrow this to specific source names for performance. .AddMeter("*") — Similarly captures all metrics, including the GenAI metrics emitted by the instrumentation layers. .ConfigureResource(r => r.AddService("TravelPlanner.WebJob")) — Tags all telemetry with the service name so you can distinguish API vs. WebJob telemetry in Application Insights. The connection string is still auto-discovered from the APPLICATIONINSIGHTS_CONNECTION_STRING environment variable — no need to pass it explicitly. The key difference between these two approaches is ceremony, not capability. Both send the same GenAI spans to Application Insights; the Agents view works identically regardless of which exporter setup you use. 📖 Docs: Azure Monitor OpenTelemetry Distro Infrastructure as Code — Provisioning the Monitoring Stack The monitoring infrastructure is provisioned via Bicep modules alongside the rest of the application's Azure resources. Here's how it fits together. Log Analytics Workspace infra/core/monitor/loganalytics.bicep creates the Log Analytics workspace that backs Application Insights: resource logAnalyticsWorkspace 'Microsoft.OperationalInsights/workspaces@2023-09-01' = { name: name location: location tags: tags properties: { sku: { name: 'PerGB2018' } retentionInDays: 30 } } Application Insights infra/core/monitor/appinsights.bicep creates a workspace-based Application Insights resource connected to Log Analytics: resource appInsights 'Microsoft.Insights/components@2020-02-02' = { name: name location: location tags: tags kind: 'web' properties: { Application_Type: 'web' WorkspaceResourceId: logAnalyticsWorkspaceId } } output connectionString string = appInsights.properties.ConnectionString Wiring It All Together In infra/main.bicep , the Application Insights connection string is passed as an app setting to the App Service: appSettings: { APPLICATIONINSIGHTS_CONNECTION_STRING: appInsights.outputs.connectionString // ... other app settings } This is the critical glue: when the app starts, the OpenTelemetry distro (or manual exporters) auto-discover this environment variable and start sending telemetry to your Application Insights resource. No connection strings in code, no configuration files — it's all infrastructure-driven. The same connection string is available to both the API and the WebJob since they run on the same App Service. All agent telemetry from both host processes flows into a single Application Insights resource, giving you a unified view across the entire application. See It in Action Once the application is deployed and processing travel plan requests, here's how to explore the agent telemetry in Application Insights. Step 1: Open the Agents (Preview) View In the Azure portal, navigate to your Application Insights resource. In the left nav, look for Agents (Preview) under the Investigations section. This opens the unified agent monitoring dashboard. Step 2: Filter by Agent The agent dropdown at the top of the page is populated by the gen_ai.agent.name values in your telemetry. You'll see all six agents listed: Travel Planning Coordinator Currency Conversion Specialist Weather & Packing Advisor Local Expert & Cultural Guide Itinerary Planning Expert Budget Optimization Specialist Select a specific agent to filter the entire dashboard — token usage, latency, error rate — down to that one agent. Step 3: Review Token Usage The token usage tile shows total input and output token consumption over your selected time range. Compare agents to find your biggest spenders. In our testing, the Coordinator agent consistently uses the most output tokens because it aggregates and synthesizes results from all five specialists. Step 4: Drill into Traces Click "View Traces with Agent Runs" to see all agent executions. Each row represents a workflow run. You can filter by time range, status (success/failure), and specific agent. Step 5: End-to-End Transaction Details Click any trace to open the end-to-end transaction details. The "simple view" renders the agent workflow as a story — showing each step, which agent handled it, how long it took, and what tools were called. For a full travel plan, you'll see the Coordinator dispatch work to each specialist, tool calls to the NWS weather API and Frankfurter currency API, and the final aggregation step. Grafana Dashboards The Agents (Preview) view in Application Insights is great for ad-hoc investigation. For ongoing monitoring and alerting, Azure Managed Grafana provides prebuilt dashboards specifically designed for agent workloads. From the Agents view, click "Explore in Grafana" to jump directly into these dashboards: Agent Framework Dashboard — Per-agent metrics including token usage trends, latency percentiles, error rates, and throughput over time. Pin this to your operations wall. Agent Framework Workflow Dashboard — Workflow-level metrics showing how multi-agent orchestrations perform end-to-end. See how long complete travel plans take, identify bottleneck agents, and track success rates. These dashboards query the same underlying data in Log Analytics, so there's zero additional instrumentation needed. If your telemetry lights up the Agents view, it lights up Grafana too. Key Packages Summary Here are the NuGet packages that make this work, pulled from the actual project files: Package Version Purpose Azure.Monitor.OpenTelemetry.AspNetCore 1.3.0 Azure Monitor OTEL Distro for ASP.NET Core (API). One-line setup for traces, metrics, and logs. Azure.Monitor.OpenTelemetry.Exporter 1.3.0 Azure Monitor OTEL exporter for non-ASP.NET Core hosts (WebJob). Trace, metric, and log exporters. Microsoft.Agents.AI 1.0.0 MAF 1.0 GA — ChatClientAgent , .UseOpenTelemetry() for agent-level instrumentation. Microsoft.Extensions.AI 10.4.1 IChatClient abstraction with .UseOpenTelemetry() for LLM-level instrumentation. OpenTelemetry.Extensions.Hosting 1.11.2 OTEL dependency injection integration for Host.CreateApplicationBuilder (WebJob). Microsoft.Extensions.AI.OpenAI 10.4.1 OpenAI/Azure OpenAI adapter for IChatClient . Bridges the Azure OpenAI SDK to the M.E.AI abstraction. Wrapping Up Let's zoom out. In this three-part series, so far we've gone from zero to a fully observable, production-grade multi-agent AI application on Azure App Service: Blog 1 covered deploying the multi-agent travel planner with MAF 1.0 GA — the agents, the architecture, the infrastructure. Blog 2 (this post) showed how to instrument those agents with OpenTelemetry, explained the GenAI semantic conventions that make agent-aware monitoring possible, and walked through the new Agents (Preview) view in Application Insights. Blog 3 will show you how to secure those agents for production with the Microsoft Agent Governance Toolkit. The pattern is straightforward: Add .UseOpenTelemetry() at the IChatClient level for LLM metrics. Add .UseOpenTelemetry(sourceName: AgentName) at the MAF agent level for agent identity. Export to Application Insights via the Azure Monitor distro (one line) or manual exporters. Wire the connection string through Bicep and environment variables. Open the Agents (Preview) view and start monitoring. With MAF 1.0 GA's built-in OpenTelemetry support and Application Insights' new Agents view, you get production-grade observability for AI agents with minimal code. The GenAI semantic conventions ensure your telemetry is structured, portable, and understood by any compliant backend. And because it's all standard OpenTelemetry, you're not locked into any single vendor — swap the exporter and your telemetry goes to Jaeger, Grafana, Datadog, or wherever you need it. Now go see what your agents are up to and check out Blog 3. Resources Sample repository: seligj95/app-service-multi-agent-maf-otel App Insights Agents (Preview) view: Documentation GenAI Semantic Conventions: OpenTelemetry GenAI Registry MAF Observability Guide: Microsoft Agent Framework Observability Azure Monitor OpenTelemetry Distro: Enable OpenTelemetry for .NET Grafana Agent Framework Dashboard: aka.ms/amg/dash/af-agent Grafana Workflow Dashboard: aka.ms/amg/dash/af-workflow Blog 1: Deploy Multi-Agent AI Apps on Azure App Service with MAF 1.0 GA Blog 3: Govern AI Agents on App Service with the Microsoft Agent Governance Toolkit571Views0likes0CommentsAzure Functions Ignite 2025 Update
Azure Functions is redefining event-driven applications and high-scale APIs in 2025, accelerating innovation for developers building the next generation of intelligent, resilient, and scalable workloads. This year, our focus has been on empowering AI and agentic scenarios: remote MCP server hosting, bulletproofing agents with Durable Functions, and first-class support for critical technologies like OpenTelemetry, .NET 10 and Aspire. With major advances in serverless Flex Consumption, enhanced performance, security, and deployment fundamentals across Elastic Premium and Flex, Azure Functions is the platform of choice for building modern, enterprise-grade solutions. Remote MCP Model Context Protocol (MCP) has taken the world by storm, offering an agent a mechanism to discover and work deeply with the capabilities and context of tools. When you want to expose MCP/tools to your enterprise or the world securely, we recommend you think deeply about building remote MCP servers that are designed to run securely at scale. Azure Functions is uniquely optimized to run your MCP servers at scale, offering serverless and highly scalable features of Flex Consumption plan, plus two flexible programming model options discussed below. All come together using the hardened Functions service plus new authentication modes for Entra and OAuth using Built-in authentication. Remote MCP Triggers and Bindings Extension GA Back in April, we shared a new extension that allows you to author MCP servers using functions with the MCP tool trigger. That MCP extension is now generally available, with support for C#(.NET), Java, JavaScript (Node.js), Python, and Typescript (Node.js). The MCP tool trigger allows you to focus on what matters most: the logic of the tool you want to expose to agents. Functions will take care of all the protocol and server logistics, with the ability to scale out to support as many sessions as you want to throw at it. [Function(nameof(GetSnippet))] public object GetSnippet( [McpToolTrigger(GetSnippetToolName, GetSnippetToolDescription)] ToolInvocationContext context, [BlobInput(BlobPath)] string snippetContent ) { return snippetContent; } New: Self-hosted MCP Server (Preview) If you’ve built servers with official MCP SDKs and want to run them as remote cloud‑scale servers without re‑writing any code, this public preview is for you. You can now self‑host your MCP server on Azure Functions—keep your existing Python, TypeScript, .NET, or Java code and get rapid 0 to N scaling, built-in server authentication and authorization, consumption-based billing, and more from the underlying Azure Functions service. This feature complements the Azure Functions MCP extension for building MCP servers using the Functions programming model (triggers & bindings). Pick the path that fits your scenario—build with the extension or standard MCP SDKs. Either way you benefit from the same scalable, secure, and serverless platform. Use the official MCP SDKs: # MCP.tool() async def get_alerts(state: str) -> str: """Get weather alerts for a US state. Args: state: Two-letter US state code (e.g. CA, NY) """ url = f"{NWS_API_BASE}/alerts/active/area/{state}" data = await make_nws_request(url) if not data or "features" not in data: return "Unable to fetch alerts or no alerts found." if not data["features"]: return "No active alerts for this state." alerts = [format_alert(feature) for feature in data["features"]] return "\n---\n".join(alerts) Use Azure Functions Flex Consumption Plan's serverless compute using Custom Handlers in host.json: { "version": "2.0", "configurationProfile": "mcp-custom-handler", "customHandler": { "description": { "defaultExecutablePath": "python", "arguments": ["weather.py"] }, "http": { "DefaultAuthorizationLevel": "anonymous" }, "port": "8000" } } Learn more about MCPTrigger and self-hosted MCP servers at https://aka.ms/remote-mcp Built-in MCP server authorization (Preview) The built-in authentication and authorization feature can now be used for MCP server authorization, using a new preview option. You can quickly define identity-based access control for your MCP servers with Microsoft Entra ID or other OpenID Connect providers. Learn more at https://aka.ms/functions-mcp-server-authorization. Better together with Foundry agents Microsoft Foundry is the starting point for building intelligent agents, and Azure Functions is the natural next step for extending those agents with remote MCP tools. Running your tools on Functions gives you clean separation of concerns, reuse across multiple agents, and strong security isolation. And with built-in authorization, Functions enables enterprise-ready authentication patterns, from calling downstream services with the agent’s identity to operating on behalf of end users with their delegated permissions. Build your first remote MCP server and connect it to your Foundry agent at https://aka.ms/foundry-functions-mcp-tutorial. Agents Microsoft Agent Framework 2.0 (Public Preview Refresh) We’re excited about the preview refresh 2.0 release of Microsoft Agent Framework that builds on battle hardened work from Semantic Kernel and AutoGen. Agent Framework is an outstanding solution for building multi-agent orchestrations that are both simple and powerful. Azure Functions is a strong fit to host Agent Framework with the service’s extreme scale, serverless billing, and enterprise grade features like VNET networking and built-in auth. Durable Task Extension for Microsoft Agent Framework (Preview) The durable task extension for Microsoft Agent Framework transforms how you build production-ready, resilient and scalable AI agents by bringing the proven durable execution (survives crashes and restarts) and distributed execution (runs across multiple instances) capabilities of Azure Durable Functions directly into the Microsoft Agent Framework. Combined with Azure Functions for hosting and event-driven execution, you can now deploy stateful, resilient AI agents that automatically handle session management, failure recovery, and scaling, freeing you to focus entirely on your agent logic. Key features of the durable task extension include: Serverless Hosting: Deploy agents on Azure Functions with auto-scaling from thousands of instances to zero, while retaining full control in a serverless architecture. Automatic Session Management: Agents maintain persistent sessions with full conversation context that survives process crashes, restarts, and distributed execution across instances Deterministic Multi-Agent Orchestrations: Coordinate specialized durable agents with predictable, repeatable, code-driven execution patterns Human-in-the-Loop with Serverless Cost Savings: Pause for human input without consuming compute resources or incurring costs Built-in Observability with Durable Task Scheduler: Deep visibility into agent operations and orchestrations through the Durable Task Scheduler UI dashboard Create a durable agent: endpoint = os.getenv("AZURE_OPENAI_ENDPOINT") deployment_name = os.getenv("AZURE_OPENAI_DEPLOYMENT_NAME", "gpt-4o-mini") # Create an AI agent following the standard Microsoft Agent Framework pattern agent = AzureOpenAIChatClient( endpoint=endpoint, deployment_name=deployment_name, credential=AzureCliCredential() ).create_agent( instructions="""You are a professional content writer who creates engaging, well-structured documents for any given topic. When given a topic, you will: 1. Research the topic using the web search tool 2. Generate an outline for the document 3. Write a compelling document with proper formatting 4. Include relevant examples and citations""", name="DocumentPublisher", tools=[ AIFunctionFactory.Create(search_web), AIFunctionFactory.Create(generate_outline) ] ) # Configure the function app to host the agent with durable session management app = AgentFunctionApp(agents=[agent]) app.run() Durable Task Scheduler dashboard for agent and agent workflow observability and debugging For more information on the durable task extension for Agent Framework, see the announcement: https://aka.ms/durable-extension-for-af-blog. Flex Consumption Updates As you know, Flex Consumption means serverless without compromise. It combines elastic scale and pay‑for‑what‑you‑use pricing with the controls you expect: per‑instance concurrency, longer executions, VNet/private networking, and Always Ready instances to minimize cold starts. Since launching GA at Ignite 2024 last year, Flex Consumption has had tremendous growth with over 1.5 billion function executions per day and nearly 40 thousand apps. Here’s what’s new for Ignite 2025: 512 MB instance size (GA). Right‑size lighter workloads, scale farther within default quota. Availability Zones (GA). Distribute instances across zones. Rolling updates (Public Preview). Unlock zero-downtime deployments of code or config by setting a single configuration. See below for more information. Even more improvements including: new diagnostic settingsto route logs/metrics, use Key Vault App Config references, new regions, and Custom Handler support. To get started, review Flex Consumption samples, or dive into the documentation to see how Flex can support your workloads. Migrating to Azure Functions Flex Consumption Migrating to Flex Consumption is simple with our step-by-step guides and agentic tools. Move your Azure Functions apps or AWS Lambda workloads, update your code and configuration, and take advantage of new automation tools. With Linux Consumption retiring, now is the time to switch. For more information, see: Migrate Consumption plan apps to the Flex Consumption plan Migrate AWS Lambda workloads to Azure Functions Durable Functions Durable Functions introduces powerful new features to help you build resilient, production-ready workflows: Distributed Tracing: lets you track requests across components and systems, giving you deep visibility into orchestration and activities with support for App Insights and OpenTelemetry. Extended Sessions support in .NET isolated: improves performance by caching orchestrations in memory, ideal for fast sequential activities and large fan-out/fan-in patterns. Orchestration versioning (public preview): enables zero-downtime deployments and backward compatibility, so you can safely roll out changes without disrupting in-flight workflows Durable Task Scheduler Updates Durable Task Scheduler Dedicated SKU (GA): Now generally available, the Dedicated SKU offers advanced orchestration for complex workflows and intelligent apps. It provides predictable pricing for steady workloads, automatic checkpointing, state protection, and advanced monitoring for resilient, reliable execution. Durable Task Scheduler Consumption SKU (Public Preview): The new Consumption SKU brings serverless, pay-as-you-go orchestration to dynamic and variable workloads. It delivers the same orchestration capabilities with flexible billing, making it easy to scale intelligent applications as needed. For more information see: https://aka.ms/dts-ga-blog OpenTelemetry support in GA Azure Functions OpenTelemetry is now generally available, bringing unified, production-ready observability to serverless applications. Developers can now export logs, traces, and metrics using open standards—enabling consistent monitoring and troubleshooting across every workload. Key capabilities include: Unified observability: Standardize logs, traces, and metrics across all your serverless workloads for consistent monitoring and troubleshooting. Vendor-neutral telemetry: Integrate seamlessly with Azure Monitor or any OpenTelemetry-compliant backend, ensuring flexibility and choice. Broad language support: Works with .NET (isolated), Java, JavaScript, Python, PowerShell, and TypeScript. Start using OpenTelemetry in Azure Functions today to unlock standards-based observability for your apps. For step-by-step guidance on enabling OpenTelemetry and configuring exporters for your preferred backend, see the documentation. Deployment with Rolling Updates (Preview) Achieving zero-downtime deployments has never been easier. The Flex Consumption plan now offers rolling updates as a site update strategy. Set a single property, and all future code deployments and configuration changes will be released with zero-downtime. Instead of restarting all instances at once, the platform now drains existing instances in batches while scaling out the latest version to match real-time demand. This ensures uninterrupted in-flight executions and resilient throughput across your HTTP, non-HTTP, and Durable workloads – even during intensive scale-out scenarios. Rolling updates are now in public preview. Learn more at https://aka.ms/functions/rolling-updates. Secure Identity and Networking Everywhere By Design Security and trust are paramount. Azure Functions incorporates proven best practices by design, with full support for managed identity—eliminating secrets and simplifying secure authentication and authorization. Flex Consumption and other plans offer enterprise-grade networking features like VNETs, private endpoints, and NAT gateways for deep protection. The Azure Portal streamlines secure function creation, and updated scenarios and samples showcase these identity and networking capabilities in action. Built-in authentication (discussed above) enables inbound client traffic to use identity as well. Check out our updated Functions Scenarios page with quickstarts or our secure samples gallery to see these identity and networking best practices in action. .NET 10 Azure Functions now supports .NET 10, bringing in a great suite of new features and performance benefits for your code. .NET 10 is supported on the isolated worker model, and it’s available for all plan types except Linux Consumption. As a reminder, support ends for the legacy in-process model on November 10, 2026, and the in-process model is not being updated with .NET 10. To stay supported and take advantage of the latest features, migrate to the isolated worker model. Aspire Aspire is an opinionated stack that simplifies development of distributed applications in the cloud. The Azure Functions integration for Aspire enables you to develop, debug, and orchestrate an Azure Functions .NET project as part of an Aspire solution. Aspire publish directly deploys to your functions to Azure Functions on Azure Container Apps. Aspire 13 includes an updated preview version of the Functions integration that acts as a release candidate with go-live support. The package will be moved to GA quality with Aspire 13.1. Java 25, Node.js 24 Azure Functions now supports Java 25 and Node.js 24 in preview. You can now develop functions using these versions locally and deploy them to Azure Functions plans. Learn how to upgrade your apps to these versions here In Summary Ready to build what’s next? Update your Azure Functions Core Tools today and explore the latest samples and quickstarts to unlock new capabilities for your scenarios. The guided quickstarts run and deploy in under 5 minutes, and incorporate best practices—from architecture to security to deployment. We’ve made it easier than ever to scaffold, deploy, and scale real-world solutions with confidence. The future of intelligent, scalable, and secure applications starts now—jump in and see what you can create!3.5KViews1like2CommentsA simpler way to deploy your code to Azure App Service for Linux
We’ve added a new deployment experience for Azure App Service for Linux that makes it easier to get your code running on your web app. To get started, go to the Kudu/SCM site for your app: <sitename>.scm.azurewebsites.net From there, open the new Deployments experience. You can now deploy your app by simply dragging and dropping a zip file containing your code. Once your file is uploaded, App Service shows you the contents of the zip so you can quickly verify what you’re about to deploy. If your application is already built and ready to run, you also have the option to skip server-side build. Otherwise, App Service can handle the build step for you. When you’re ready, select Deploy. From there, the deployment starts right away, and you can follow each phase of the process as it happens. The experience shows clear progress through upload, build, and deployment, along with deployment logs to help you understand what’s happening behind the scenes. After the deployment succeeds, you can also view runtime logs, which makes it easier to confirm that your app has started successfully. This experience is ideal if you’re getting started with Azure App Service and want the quickest path from code to a running app. For production workloads and teams with established release processes, you’ll typically continue using an automated CI/CD pipeline (for example, GitHub Actions or Azure DevOps) for repeatable deployments. We’re continuing to improve the developer experience on App Service for Linux. Give it a try and let us know what you think.334Views1like0CommentsDeploying to Azure Web App from Azure DevOps Using UAMI
TOC UAMI Configuration App Configuration Azure DevOps Configuration Logs UAMI Configuration Create a User Assigned Managed Identity with no additional configuration. This identity will be mentioned in later steps, especially at Object ID. App Configuration On an existing Azure Web App, enable Diagnostic Settings and configure it to retain certain types of logs, such as Access Audit Logs. These logs will be discussed in the final section of this article. Next, navigate to Access Control (IAM) and assign the previously created User Assigned Managed Identity the Website Contributor role. Azure DevOps Configuration Go to Azure DevOps → Project Settings → Service Connections, and create a new ARM (Azure Resource Manager) connection. While creating the connection: Select the corresponding User Assigned Managed Identity Grant it appropriate permissions at the Resource Group level During this process, you will be prompted to sign in again using your own account. This authentication will later be reflected in the deployment logs discussed below. Assuming the following deployment template is used in the pipeline, you will notice that additional steps appear in the deployment process compared to traditional service principal–based authentication. Logs A few minutes after deployment, related log records will appear. In the AppServiceAuditLogs table, you can observe that the deployment initiator is shown as the Object ID from UAMI, and the Source is listed as Azure (DevOps). This indicates that the User Assigned Managed Identity is authorized under my user context, while the deployment action itself is initiated by Azure DevOps.433Views0likes0Comments