best practices
93 TopicsSearch Less, Build More: Inner Sourcing with GitHub Copilot and ADO MCP Server
Developers burn cycles context‑switching: opening five repos to find a logging example, searching a wiki for a data masking rule, scrolling chat history for the latest pipeline pattern. Organisations that I speak to are often on the path of transformational platform engineering projects but always have the fear or doubt of "what if my engineers don't use these resources". While projects like Backstage still play a pivotal role in inner sourcing and discoverability I also empathise with developers who would argue "How would I even know in the first place, which modules have or haven't been created for reuse". In this blog we explore how we can ensure organisational standards and developer satisfaction without any heavy lifting on either side, no custom model training, no rewriting or relocating of repositories and no stagnant local data. Using GitHub Copilot + Azure DevOps MCP server (with the free `code_search` extension) we turn the IDE into an organizational knowledge interface. Instead of guessing or re‑implementing, engineers can start scaffolding projects or solving issues as they would normally (hopefully using Copilot) and without extra prompting. GitHub Copilot can lean into organisational standards and ensure recommendations are made with code snippets directly generated from existing examples. What Is the Azure DevOps MCP Server + code_search Extension? MCP (Model Context Protocol) is an open standard that lets agents (like GitHub Copilot) pull in structured, on-demand context from external systems. MCP servers contain natural language explanations of the tools that the agent can utilise allowing dynamic decision making of when to implement certain toolsets over others. The Azure DevOps MCP Server is the ADO Product Team's implementation of that standard. It exposes your ADO environment in a way Copilot can consume. Out of the box it gives you access to: Projects – list and navigate across projects in your organization. Repositories – browse repos, branches, and files. Work items – surface user stories, bugs, or acceptance criteria. Wiki's – pull policies, standards, and documentation. This means Copilot can ground its answers in live ADO content, instead of hallucinating or relying only on what’s in the current editor window. The ADO server runs locally from your own machine to ensure that all sensitive project information remains within your secure network boundary. This also means that existing permissions on ADO objects such as Projects or Repositories are respected. Wiki search tooling available out of the box with ADO MCP server is very useful however if I am honest I have seen these wiki's go unused with documentation being stored elsewhere either inside the repository or in a project management tool. This means any tool that needs to implement code requires the ability to accurately search the code stored in the repositories themself. That is where the code_search extension enablement in ADO is so important. Most organisations have this enabled already however it is worth noting that this pre-requisite is the real unlock of cross-repo search. This allows for Copilot to: Query for symbols, snippets, or keywords across all repos. Retrieve usage examples from code, not just docs. Locate standards (like logging wrappers or retry policies) wherever they live. Back every recommendation with specific source lines. In short: MCP connects Copilot to Azure DevOps. code_search makes that connection powerful by turning it into a discovery engine. What is the relevance of Copilot Instructions? One of the less obvious but most powerful features of GitHub Copilot is its ability to follow instructions files. Copilot automatically looks for these files and uses them as a “playbook” for how it should behave. There are different types of instructions you can provide: Organisational instructions – apply across your entire workspace, regardless of which repo you’re in. Repo-specific instructions – scoped to a particular repository, useful when one project has unique standards or patterns. Personal instructions – smaller overrides layered on top of global rules when a local exception applies. (Stored in .github/copilot-instructions.md) In this solution, I’m using a single personal instructions file. It tells Copilot: When to search (e.g., always query repos and wikis before answering a standards question). Where to look (Azure DevOps repos, wikis, and with code_search, the code itself). How to answer (responses must cite the repo/file/line or wiki page; if no source is found, say so). How to resolve conflicts (prefer dated wiki entries over older README fragments). As a small example, a section of a Copilot instruction file could look like this: # GitHub Copilot Instructions for Azure DevOps MCP Integration This project uses Azure DevOps with MCP server integration to provide organizational context awareness. Always check to see if the Azure DevOps MCP server has a tool relevant to the user's request. ## Core Principles ### 1. Azure DevOps Integration - **Always prioritize Azure DevOps MCP tools** when users ask about: - Work items, stories, bugs, tasks - Pull requests and code reviews - Build pipelines and deployments - Repository operations and branch management - Wiki pages and documentation - Test plans and test cases - Project and team information ### 2. Organizational Context Awareness - Before suggesting solutions, **check existing organizational patterns** by: - Searching code across repositories for similar implementations - Referencing established coding standards and frameworks - Looking for existing shared libraries and utilities - Checking architectural decision records (ADRs) in wikis ### 3. Cross-Repository Intelligence - When providing code suggestions: - **Search for existing patterns** in other repositories first - **Reference shared libraries** and common utilities - **Maintain consistency** with organizational standards - **Suggest reusable components** when appropriate ## Tool Usage Guidelines ### Work Items and Project Management When users mention bugs, features, tasks, or project planning: ``` ✅ Use: wit_my_work_items, wit_create_work_item, wit_update_work_item ✅ Use: wit_list_backlogs, wit_get_work_items_for_iteration ✅ Use: work_list_team_iterations, core_list_projects The result... To test this I created 3 ADO Projects each with between 1-2 repositories. The repositories were light with only ReadMe's inside containing descriptions of the "repo" and some code snippets examples for usage. I have then created a brand-new workspace with no context apart from a Copilot instructions document (which could be part of a repo scaffold or organisation wide) which tells Copilot to search code and the wikis across all ADO projects in my demo environment. It returns guidance and standards from all available repo's and starts to use it to formulate its response. In the screenshot I have highlighted some key parts with red boxes. The first being a section of the readme that Copilot has identified in its response, that part also highlighted within CoPilot chat response. I have highlighted the rather generic prompt I used to get this response at the bottom of that window too. Above I have highlighted Copilot using the MCP server tooling searching through projects, repo's and code. Finally the largest box highlights the instructions given to Copilot on how to search and how easily these could be optimised or changed depending on the requirements and organisational coding standards. How did I implement this? Implementation is actually incredibly simple. As mentioned I created multiple projects and repositories within my ADO Organisation in order to test cross-project & cross-repo discovery. I then did the following: Enable code_search - in your Azure DevOps organization (Marketplace → install extension). Login to Azure - Use the AZ CLI to authenticate to Azure with "az login". Create vscode/mcp.json file - Snippet is provided below, the organisation name should be changed to your organisations name. Start and enable your MCP server - In the mcp.json file you should see a "Start" button. Using the snippet below you will be prompted to add your organisation name. Ensure your Copilot agent has access to the server under "tools" too. View this setup guide for full setup instructions (azure-devops-mcp/docs/GETTINGSTARTED.md at main · microsoft/azure-devops-mcp) Create a Copilot Instructions file - with a search-first directive. I have inserted the full version used in this demo at the bottom of the article. Experiment with Prompts – Start generic (“How do we secure APIs?”). Review the output and tools used and then tailor your instructions. Considerations While this is a great approach I do still have some considerations when going to production. Latency - Using MCP tooling on every request will add some latency to developer requests. We can look at optimizing usage through copilot instructions to better identify when Copilot should or shouldn't use the ADO MCP server. Complex Projects and Repositories - While I have demonstrated cross project and cross repository retrieval my demo environment does not truly simulate an enterprise ADO environment. Performance should be tested and closely monitored as organisational complexity increases. Public Preview - The ADO MCP server is moving quickly but is currently still public preview. We have demonstrated in this article how quickly we can make our Azure DevOps content discoverable. While their are considerations moving forward this showcases a direction towards agentic inner sourcing. Feel free to comment below how you think this approach could be extended or augmented for other use cases! Resources MCP Server Config (/.vscode/mcp.json) { "inputs": [ { "id": "ado_org", "type": "promptString", "description": "Azure DevOps organization name (e.g. 'contoso')" } ], "servers": { "ado": { "type": "stdio", "command": "npx", "args": ["-y", "@azure-devops/mcp", "${input:ado_org}"] } } } Copilot Instructions (/.github/copilot-instructions.md) # GitHub Copilot Instructions for Azure DevOps MCP Integration This project uses Azure DevOps with MCP server integration to provide organizational context awareness. Always check to see if the Azure DevOps MCP server has a tool relevant to the user's request. ## Core Principles ### 1. Azure DevOps Integration - **Always prioritize Azure DevOps MCP tools** when users ask about: - Work items, stories, bugs, tasks - Pull requests and code reviews - Build pipelines and deployments - Repository operations and branch management - Wiki pages and documentation - Test plans and test cases - Project and team information ### 2. Organizational Context Awareness - Before suggesting solutions, **check existing organizational patterns** by: - Searching code across repositories for similar implementations - Referencing established coding standards and frameworks - Looking for existing shared libraries and utilities - Checking architectural decision records (ADRs) in wikis ### 3. Cross-Repository Intelligence - When providing code suggestions: - **Search for existing patterns** in other repositories first - **Reference shared libraries** and common utilities - **Maintain consistency** with organizational standards - **Suggest reusable components** when appropriate ## Tool Usage Guidelines ### Work Items and Project Management When users mention bugs, features, tasks, or project planning: ``` ✅ Use: wit_my_work_items, wit_create_work_item, wit_update_work_item ✅ Use: wit_list_backlogs, wit_get_work_items_for_iteration ✅ Use: work_list_team_iterations, core_list_projects ``` ### Code and Repository Operations When users ask about code, branches, or pull requests: ``` ✅ Use: repo_list_repos_by_project, repo_list_pull_requests_by_repo ✅ Use: repo_list_branches_by_repo, repo_search_commits ✅ Use: search_code for finding patterns across repositories ``` ### Documentation and Knowledge Sharing When users need documentation or want to create/update docs: ``` ✅ Use: wiki_list_wikis, wiki_get_page_content, wiki_create_or_update_page ✅ Use: search_wiki for finding existing documentation ``` ### Build and Deployment When users ask about builds, deployments, or CI/CD: ``` ✅ Use: pipelines_get_builds, pipelines_get_build_definitions ✅ Use: pipelines_run_pipeline, pipelines_get_build_status ``` ## Response Patterns ### 1. Discovery First Before providing solutions, always discover organizational context: ``` "Let me first check what patterns exist in your organization..." → Search code, check repositories, review existing work items ``` ### 2. Reference Organizational Standards When suggesting code or approaches: ``` "Based on patterns I found in your [RepositoryName] repository..." "Following your organization's standard approach seen in..." "This aligns with the pattern established in [TeamName]'s implementation..." ``` ### 3. Actionable Integration Always offer to create or update Azure DevOps artifacts: ``` "I can create a work item for this enhancement..." "Should I update the wiki page with this new pattern?" "Let me link this to the current iteration..." ``` ## Specific Scenarios ### New Feature Development 1. **Search existing repositories** for similar features 2. **Check architectural patterns** and shared libraries 3. **Review related work items** and planning documents 4. **Suggest implementation** based on organizational standards 5. **Offer to create work items** and documentation ### Bug Investigation 1. **Search for similar issues** across repositories and work items 2. **Check related builds** and recent changes 3. **Review test results** and failure patterns 4. **Provide solution** based on organizational practices 5. **Offer to create/update** bug work items and documentation ### Code Review and Standards 1. **Compare against organizational patterns** found in other repositories 2. **Reference coding standards** from wiki documentation 3. **Suggest improvements** based on established practices 4. **Check for reusable components** that could be leveraged ### Documentation Requests 1. **Search existing wikis** for related content 2. **Check for ADRs** and technical documentation 3. **Reference patterns** from similar projects 4. **Offer to create/update** wiki pages with findings ## Error Handling If Azure DevOps MCP tools are not available or fail: 1. **Inform the user** about the limitation 2. **Provide alternative approaches** using available information 3. **Suggest manual steps** for Azure DevOps integration 4. **Offer to help** with configuration if needed ## Best Practices ### Always Do: - ✅ Search organizational context before suggesting solutions - ✅ Reference existing patterns and standards - ✅ Offer to create/update Azure DevOps artifacts - ✅ Maintain consistency with organizational practices - ✅ Provide actionable next steps ### Never Do: - ❌ Suggest solutions without checking organizational context - ❌ Ignore existing patterns and implementations - ❌ Provide generic advice when specific organizational context is available - ❌ Forget to offer Azure DevOps integration opportunities --- **Remember: The goal is to provide intelligent, context-aware assistance that leverages the full organizational knowledge base available through Azure DevOps while maintaining development efficiency and consistency.**1.2KViews1like3CommentsIndustry-Wide Certificate Changes Impacting Azure App Service Certificates
Executive Summary In early 2026, industry-wide changes mandated by browser applications and the CA/B Forum will affect both how TLS certificates are issued as well as their validity period. The CA/B Forum is a vendor body that establishes standards for securing websites and online communications through SSL/TLS certificates. Azure App Service is aligning with these standards for both App Service Managed Certificates (ASMC, free, DigiCert-issued) and App Service Certificates (ASC, paid, GoDaddy-issued). Most customers will experience no disruption. Action is required only if you pin certificates or use them for client authentication (mTLS). Who Should Read This? App Service administrators Security and compliance teams Anyone responsible for certificate management or application security Quick Reference: What’s Changing & What To Do Topic ASMC (Managed, free) ASC (GoDaddy, paid) Required Action New Cert Chain New chain (no action unless pinned) New chain (no action unless pinned) Remove certificate pinning Client Auth EKU Not supported (no action unless cert is used for mTLS) Not supported (no action unless cert is used for mTLS) Transition from mTLS Validity No change (already compliant) Two overlapping certs issued for the full year None (automated) If you do not pin certificates or use them for mTLS, no action is required. Timeline of Key Dates Date Change Action Required Mid-Jan 2026 and after ASMC migrates to new chain ASMC stops supporting client auth EKU Remove certificate pinning if used Transition to alternative authentication if the certificate is used for mTLS Mar 2026 and after ASC validity shortened ASC migrates to new chain ASC stops supporting client auth EKU Remove certificate pinning if used Transition to alternative authentication if the certificate is used for mTLS Actions Checklist For All Users Review your use of App Service certificates. If you do not pin these certificates and do not use them for mTLS, no action is required. If You Pin Certificates (ASMC or ASC) Remove all certificate or chain pinning before their respective key change dates to avoid service disruption. See Best Practices: Certificate Pinning. If You Use Certificates for Client Authentication (mTLS) Switch to an alternative authentication method before their respective key change dates to avoid service disruption, as client authentication EKU will no longer be supported for these certificates. See Sunsetting the client authentication EKU from DigiCert public TLS certificates. See Set Up TLS Mutual Authentication - Azure App Service Details & Rationale Why Are These Changes Happening? These updates are required by major browser programs (e.g., Chrome) and apply to all public CAs. They are designed to enhance security and compliance across the industry. Azure App Service is automating updates to minimize customer impact. What’s Changing? New Certificate Chain Certificates will be issued from a new chain to maintain browser trust. Impact: Remove any certificate pinning to avoid disruption. Removal of Client Authentication EKU Newly issued certificates will not support client authentication EKU. This change aligns with Google Chrome’s root program requirements to enhance security. Impact: If you use these certificates for mTLS, transition to an alternate authentication method. Shortening of Certificate Validity Certificate validity is now limited to a maximum of 200 days. Impact: ASMC is already compliant; ASC will automatically issue two overlapping certificates to cover one year. No billing impact. Frequently Asked Questions (FAQs) Will I lose coverage due to shorter validity? No. For App Service Certificate, App Service will issue two certificates to span the full year you purchased. Is this unique to DigiCert and GoDaddy? No. This is an industry-wide change. Do these changes impact certificates from other CAs? Yes. These changes are an industry-wide change. We recommend you reach out to your certificates’ CA for more information. Do I need to act today? If you do not pin or use these certs for mTLS, no action is required. Glossary ASMC: App Service Managed Certificate (free, DigiCert-issued) ASC: App Service Certificate (paid, GoDaddy-issued) EKU: Extended Key Usage mTLS: Mutual TLS (client certificate authentication) CA/B Forum: Certification Authority/Browser Forum Additional Resources Changes to the Managed TLS Feature Set Up TLS Mutual Authentication Azure App Service Best Practices – Certificate pinning DigiCert Root and Intermediate CA Certificate Updates 2023 Sunsetting the client authentication EKU from DigiCert public TLS certificates Feedback & Support If you have questions or need help, please visit our official support channels or the Microsoft Q&A, where our team and the community can assist you.566Views1like0CommentsImportant Changes to App Service Managed Certificates: Is Your Certificate Affected?
Overview As part of an upcoming industry-wide change, DigiCert, the Certificate Authority (CA) for Azure App Service Managed Certificates (ASMC), is required to migrate to a new validation platform to meet multi-perspective issuance corroboration (MPIC) requirements. While most certificates will not be impacted by this change, certain site configurations and setups may prevent certificate issuance or renewal starting July 28, 2025. Update December 8, 2025 We’ve published an update in November about how App Service Managed Certificates can now be supported on sites that block public access. This reverses the limitation introduced in July 2025, as mentioned in this blog. Note: This blog post reflects a point-in-time update and will not be revised. For the latest and most accurate details on App Service Managed Certificates, please refer to official documentation or subsequent updates. Learn more about the November 2025 update here: Follow-Up to ‘Important Changes to App Service Managed Certificates’: November 2025 Update. August 5, 2025 We’ve published a Microsoft Learn documentation titled App Service Managed Certificate (ASMC) changes – July 28, 2025 that contains more in-depth mitigation guidance and a growing FAQ section to support the changes outlined in this blog post. While the blog currently contains the most complete overview, the documentation will soon be updated to reflect all blog content. Going forward, any new information or clarifications will be added to the documentation page, so we recommend bookmarking it for the latest guidance. What Will the Change Look Like? For most customers: No disruption. Certificate issuance and renewals will continue as expected for eligible site configurations. For impacted scenarios: Certificate requests will fail (no certificate issued) starting July 28, 2025, if your site configuration is not supported. Existing certificates will remain valid until their expiration (up to six months after last renewal). Impacted Scenarios You will be affected by this change if any of the following apply to your site configurations: Your site is not publicly accessible: Public accessibility to your app is required. If your app is only accessible privately (e.g., requiring a client certificate for access, disabling public network access, using private endpoints or IP restrictions), you will not be able to create or renew a managed certificate. Other site configurations or setup methods not explicitly listed here that restrict public access, such as firewalls, authentication gateways, or any custom access policies, can also impact eligibility for managed certificate issuance or renewal. Action: Ensure your app is accessible from the public internet. However, if you need to limit access to your app, then you must acquire your own SSL certificate and add it to your site. Your site uses Azure Traffic Manager "nested" or "external" endpoints: Only “Azure Endpoints” on Traffic Manager will be supported for certificate creation and renewal. “Nested endpoints” and “External endpoints” will not be supported. Action: Transition to using "Azure Endpoints". However, if you cannot, then you must obtain a different SSL certificate for your domain and add it to your site. Your site relies on *.trafficmanager.net domain: Certificates for *.trafficmanager.net domains will not be supported for creation or renewal. Action: Add a custom domain to your app and point the custom domain to your *.trafficmanager.net domain. After that, secure the custom domain with a new SSL certificate. If none of the above applies, no further action is required. How to Identify Impacted Resources? To assist with the upcoming changes, you can use Azure Resource Graph (ARG) queries to help identify resources that may be affected under each scenario. Please note that these queries are provided as a starting point and may not capture every configuration. Review your environment for any unique setups or custom configurations. Scenario 1: Sites Not Publicly Accessible This ARG query retrieves a list of sites that either have the public network access property disabled or are configured to use client certificates. It then filters for sites that are using App Service Managed Certificates (ASMC) for their custom hostname SSL bindings. These certificates are the ones that could be affected by the upcoming changes. However, please note that this query does not provide complete coverage, as there may be additional configurations impacting public access to your app that are not included here. Ultimately, this query serves as a helpful guide for users, but a thorough review of your environment is recommended. You can copy this query, paste it into Azure Resource Graph Explorer, and then click "Run query" to view the results for your environment. // ARG Query: Identify App Service sites that commonly restrict public access and use ASMC for custom hostname SSL bindings resources | where type == "microsoft.web/sites" // Extract relevant properties for public access and client certificate settings | extend publicNetworkAccess = tolower(tostring(properties.publicNetworkAccess)), clientCertEnabled = tolower(tostring(properties.clientCertEnabled)) // Filter for sites that either have public network access disabled // or have client certificates enabled (both can restrict public access) | where publicNetworkAccess == "disabled" or clientCertEnabled != "false" // Expand the list of SSL bindings for each site | mv-expand hostNameSslState = properties.hostNameSslStates | extend hostName = tostring(hostNameSslState.name), thumbprint = tostring(hostNameSslState.thumbprint) // Only consider custom domains (exclude default *.azurewebsites.net) and sites with an SSL certificate bound | where tolower(hostName) !endswith "azurewebsites.net" and isnotempty(thumbprint) // Select key site properties for output | project siteName = name, siteId = id, siteResourceGroup = resourceGroup, thumbprint, publicNetworkAccess, clientCertEnabled // Join with certificates to find only those using App Service Managed Certificates (ASMC) // ASMCs are identified by the presence of the "canonicalName" property | join kind=inner ( resources | where type == "microsoft.web/certificates" | extend certThumbprint = tostring(properties.thumbprint), canonicalName = tostring(properties.canonicalName) // Only ASMC uses the "canonicalName" property | where isnotempty(canonicalName) | project certName = name, certId = id, certResourceGroup = tostring(properties.resourceGroup), certExpiration = properties.expirationDate, certThumbprint, canonicalName ) on $left.thumbprint == $right.certThumbprint // Final output: sites with restricted public access and using ASMC for custom hostname SSL bindings | project siteName, siteId, siteResourceGroup, publicNetworkAccess, clientCertEnabled, thumbprint, certName, certId, certResourceGroup, certExpiration, canonicalName Scenario 2: Traffic Manager Endpoint Types For this scenario, please manually review your Traffic Manager profile configurations to ensure only “Azure Endpoints” are in use. We recommend inspecting your Traffic Manager profiles directly in the Azure portal or using relevant APIs to confirm your setup and ensure compliance with the new requirements. Scenario 3: Certificates Issued to *.trafficmanager.net Domains This ARG query helps you identify App Service Managed Certificates (ASMC) that were issued to *.trafficmanager.net domains. In addition, it also checks whether any web apps are currently using those certificates for custom domain SSL bindings. You can copy this query, paste it into Azure Resource Graph Explorer, and then click "Run query" to view the results for your environment. // ARG Query: Identify App Service Managed Certificates (ASMC) issued to *.trafficmanager.net domains // Also checks if any web apps are currently using those certificates for custom domain SSL bindings resources | where type == "microsoft.web/certificates" // Extract the certificate thumbprint and canonicalName (ASMCs have a canonicalName property) | extend certThumbprint = tostring(properties.thumbprint), canonicalName = tostring(properties.canonicalName) // Only ASMC uses the "canonicalName" property // Filter for certificates issued to *.trafficmanager.net domains | where canonicalName endswith "trafficmanager.net" // Select key certificate properties for output | project certName = name, certId = id, certResourceGroup = tostring(properties.resourceGroup), certExpiration = properties.expirationDate, certThumbprint, canonicalName // Join with web apps to see if any are using these certificates for SSL bindings | join kind=leftouter ( resources | where type == "microsoft.web/sites" // Expand the list of SSL bindings for each site | mv-expand hostNameSslState = properties.hostNameSslStates | extend hostName = tostring(hostNameSslState.name), thumbprint = tostring(hostNameSslState.thumbprint) // Only consider bindings for *.trafficmanager.net custom domains with a certificate bound | where tolower(hostName) endswith "trafficmanager.net" and isnotempty(thumbprint) // Select key site properties for output | project siteName = name, siteId = id, siteResourceGroup = resourceGroup, thumbprint ) on $left.certThumbprint == $right.thumbprint // Final output: ASMCs for *.trafficmanager.net domains and any web apps using them | project certName, certId, certResourceGroup, certExpiration, canonicalName, siteName, siteId, siteResourceGroup Ongoing Updates We will continue to update this post with any new queries or important changes as they become available. Be sure to check back for the latest information. Note on Comments We hope this information helps you navigate the upcoming changes. To keep this post clear and focused, comments are closed. If you have questions, need help, or want to share tips or alternative detection methods, please visit our official support channels or the Microsoft Q&A, where our team and the community can assist you.24KViews1like1CommentReimagining AI Ops with Azure SRE Agent: New Automation, Integration, and Extensibility features
Azure SRE Agent offers intelligent and context aware automation for IT operations. Enhanced by customer feedback from our preview, the SRE Agent has evolved into an extensible platform to automate and manage tasks across Azure and other environments. Built on an Agentic DevOps approach - drawing from proven practices in internal Azure operations - the Azure SRE Agent has already saved over 20,000 engineering hours across Microsoft product teams operations, delivering strong ROI for teams seeking sustainable AIOps. An Operations Agent that adapts to your playbooks Azure SRE Agent is an AI powered operations automation platform that empowers SREs, DevOps, IT operations, and support teams to automate tasks such as incident response, customer support, and developer operations from a single, extensible agent. Its value proposition and capabilities have evolved beyond diagnosis and mitigation of Azure issues, to automating operational workflows and seamless integration with the standards and processes used in your organization. SRE Agent is designed to automate operational work and reduce toil, enabling developers and operators to focus on high-value tasks. By streamlining repetitive and complex processes, SRE Agent accelerates innovation and improves reliability across cloud and hybrid environments. In this article, we will look at what’s new and what has changed since the last update. What’s New: Automation, Integration, and Extensibility Azure SRE Agent just got a major upgrade. From no-code automation to seamless integrations and expanded data connectivity, here’s what’s new in this release: No-code Sub-Agent Builder: Rapidly create custom automations without writing code. Flexible, event-driven triggers: Instantly respond to incidents and operational changes. Expanded data connectivity: Unify diagnostics and troubleshooting across more data sources. Custom actions: Integrate with your existing tools and orchestrate end-to-end workflows via MCP. Prebuilt operational scenarios: Accelerate deployment and improve reliability out of the box. Unlike generic agent platforms, Azure SRE Agent comes with deep integrations, prebuilt tools, and frameworks specifically for IT, DevOps, and SRE workflows. This means you can automate complex operational tasks faster and more reliably, tailored to your organization’s needs. Sub-Agent Builder: Custom Automation, No Code Required Empower teams to automate repetitive operational tasks without coding expertise, dramatically reducing manual workload and development cycles. This feature helps address the need for targeted automation, letting teams solve specific operational pain points without relying on one-size-fits-all solutions. Modular Sub-Agents: Easily create custom sub-agents tailored to your team’s needs. Each sub-agent can have its own instructions, triggers, and toolsets, letting you automate everything from outage response to customer email triage. Prebuilt System Tools: Eliminate the inefficiency of creating basic automation from scratch, and choose from a rich library of hundreds of built-in tools for Azure operations, code analysis, deployment management, diagnostics, and more. Custom Logic: Align automation to your unique business processes by defining your automation logic and prompts, teaching the agent to act exactly as your workflow requires. Flexible Triggers: Automate on Your Terms Invoke the agent to respond automatically to mission-critical events, not wait for manual commands. This feature helps speed up incident response and eliminate missed opportunities for efficiency. Multi-Source Triggers: Go beyond chat-based interactions, and trigger the agent to automatically respond to Incident Management and Ticketing systems like PagerDuty and ServiceNow, Observability Alerting systems like Azure Monitor Alerts, or even on a cron-based schedule for proactive monitoring and best-practices checks. Additional trigger sources such as GitHub issues, Azure DevOps pipelines, email, etc. will be added over time. This means automation can start exactly when and where you need it. Event-Driven Operations: Integrate with your CI/CD, monitoring, or support systems to launch automations in response to real-world events - like deployments, incidents, or customer requests. Vital for reducing downtime, it ensures that business-critical actions happen automatically and promptly. Expanded Data Connectivity: Unified Observability and Troubleshooting Integrate data, enabling comprehensive diagnostics and troubleshooting and faster, more informed decision-making by eliminating silos and speeding up issue resolution. Multiple Data Sources: The agent can now read data from Azure Monitor, Log Analytics, and Application Insights based on its Azure role-based access control (RBAC). Additional observability data sources such as Dynatrace, New Relic, Datadog, and more can be added via the Remote Model Context Protocol (MCP) servers for these tools. This gives you a unified view for diagnostics and automation. Knowledge Integration: Rather than manually detailing every instruction in your prompt, you can upload your Troubleshooting Guide (TSG) or Runbook directly, allowing the agent to automatically create an execution plan from the file. You may also connect the agent to resources like SharePoint, Jira, or documentation repositories through Remote MCP servers, enabling it to retrieve needed files on its own. This approach utilizes your organization’s existing knowledge base, streamlining onboarding and enhancing consistency in managing incidents. Azure SRE Agent is also building multi-agent collaboration by integrating with PagerDuty and Neubird, enabling advanced, cross-platform incident management and reliability across diverse environments. Custom Actions: Automate Anything, Anywhere Extend automation beyond Azure and integrate with any tool or workflow, solving the problem of limited automation scope and enabling end-to-end process orchestration. Out-of-the-Box Actions: Instantly automate common tasks like running azcli, kubectl, creating GitHub issues, or updating Azure resources, reducing setup time and operational overhead. Communication Notifications: The SRE Agent now features built-in connectors for Outlook, enabling automated email notifications, and for Microsoft Teams, allowing it to post messages directly to Teams channels for streamlined communication. Bring Your Own Actions: Drop in your own Remote MCP servers to extend the agent’s capabilities to any custom tool or workflow. Future-proof your agentic DevOps by automating proprietary or emerging processes with confidence. Prebuilt Operations Scenarios Address common operational challenges out of the box, saving teams time and effort while improving reliability and customer satisfaction. Incident Response: Minimize business impact and reduce operational risk by automating detection, diagnosis, and mitigation of your workload stack. The agent has built-in runbooks for common issues related to many Azure resource types including Azure Kubernetes Service (AKS), Azure Container Apps (ACA), Azure App Service, Azure Logic Apps, Azure Database for PostgreSQL, Azure CosmosDB, Azure VMs, etc. Support for additional resource types is being added continually, please see product documentation for the latest information. Root Cause Analysis & IaC Drift Detection: Instantly pinpoint incident causes with AI-driven root cause analysis including automated source code scanning via GitHub and Azure DevOps integration. Proactively detect and resolve infrastructure drift by comparing live cloud environments against source-controlled IaC, ensuring configuration consistency and compliance. Handle Complex Investigations: Enable the deep investigation mode that uses a hypothesis-driven method to analyze possible root causes. It collects logs and metrics, tests hypotheses with iterative checks, and documents findings. The process delivers a clear summary and actionable steps to help teams accurately resolve critical issues. Incident Analysis: The integrated dashboard offers a comprehensive overview of all incidents managed by the SRE Agent. It presents essential metrics, including the number of incidents reviewed, assisted, and mitigated by the agent, as well as those awaiting human intervention. Users can leverage aggregated visualizations and AI-generated root cause analyses to gain insights into incident processing, identify trends, enhance response strategies, and detect areas for improvement in incident management. Inbuilt Agent Memory: The new SRE Agent Memory System transforms incident response by institutionalizing the expertise of top SREs - capturing, indexing, and reusing critical knowledge from past incidents, investigations, and user guidance. Benefit from faster, more accurate troubleshooting, as the agent learns from both successes and mistakes, surfacing relevant insights, runbooks, and mitigation strategies exactly when needed. This system leverages advanced retrieval techniques and a domain-aware schema to ensure every on-call engagement is smarter than the last, reducing mean time to resolution (MTTR) and minimizing repeated toil. Automatically gain a continuously improving agent that remembers what works, avoids past pitfalls, and delivers actionable guidance tailored to the environment. GitHub Copilot and Azure DevOps Integration: Automatically triage, respond to, and resolve issues raised in GitHub or Azure DevOps. Integration with modern development platforms such as GitHub Copilot coding agent increases efficiency and ensures that issues are resolved faster, reducing bottlenecks in the development lifecycle. Ready to get started? Azure SRE Agent home page Product overview Pricing Page Pricing Calculator Pricing Blog Demo recordings Deployment samples What’s Next? Give us feedback: Your feedback is critical - You can Thumbs Up / Thumbs Down each interaction or thread, or go to the “Give Feedback” button in the agent to give us in-product feedback - or you can create issues or just share your thoughts in our GitHub repo at https://github.com/microsoft/sre-agent. We’re just getting started. In the coming months, expect even more prebuilt integrations, expanded data sources, and new automation scenarios. We anticipate continuous growth and improvement throughout our agentic AI platforms and services to effectively address customer needs and preferences. Let us know what Ops toil you want to automate next!2.4KViews0likes0CommentsAzure SRE Agent: Expanding Observability and Multi-Cloud Resilience
The Azure SRE Agent continues to evolve as a cornerstone for operational excellence and incident management. Over the past few months, we have made significant strides in enabling integrations with leading observability platforms—Dynatrace, New Relic, and Datadog—through Model Context Protocol (MCP) Servers. These partnerships serve joint customers, enabling automated remediation across diverse environments. Deepening Integrations with MCP Servers Our collaboration with these partners is more than technical—it’s about delivering value at scale. Datadog, New Relic, and Dynatrace are all Azure Native ISV Service partners. With these integrations Azure Native customers can also choose to add these MCP servers directly from the Azure Native partners’ resource: Datadog: At Ignite, Azure SRE Agent was presented with the Datadog MCP Server, to demonstrate how our customers can streamline complex workflows. Customers can now bring their Datadog MCP Server into Azure SRE Agent, extending knowledge capabilities and centralizing logs and metrics. Find Datadog Azure Native offerings on Marketplace. New Relic: When an alert fires in New Relic, the Azure SRE Agent calls the New Relic MCP Server to provide Intelligent Observability insights. This agentic integration with the New Relic MCP Server offers over 35 specialized tools across, entity and account management, alerts and monitoring, data analysis and queries, performance analysis, and much more. The advanced remediation skills of the Azure SRE Agent + New Relic AI help our joint customers diagnose and resolve production issues faster. Find New Relic’s Azure Native offering on Marketplace Dynatrace: The Dynatrace integration bridges Microsoft Azure's cloud-native infrastructure management with Dynatrace's AI-powered observability platform, leveraging the Davis AI engine and remote MCP server capabilities for incident detection, root cause analysis, and remediation across hybrid cloud environments. Check out Dynatrace’s Azure Native offering on Marketplace. These integrations are made possible by Azure SRE Agent’s MCP connectors. The MCP connectors in Azure SRE Agent act as the bridge between the agent and MCP servers, enabling dynamic discovery and execution of specialized tools for observability and incident management across diverse environments. This feature allows customers to build their own custom sub-agents to leverage tools from MCP Servers from integrated platforms like Dynatrace, Datadog, and New Relic to complement the agent’s diagnostic and remediation capabilities. By connecting Azure SRE Agent to external MCP servers scenarios such as cross-platform telemetry analysis are unlocked. Looking Ahead: Multi-Agent Collaboration Azure SRE Agent isn’t stopping with MCP integrations We’re actively working with PagerDuty and NeuBird to support dynamic use cases via agent-to-agent collaboration: PagerDuty: PagerDuty’s PD Advance SRE Agent is an AI-powered assistant that triages incidents by analyzing logs, diagnostics, past incident history, and runbooks to surface relevant context and recommended remediations. At Ignite, PagerDuty and Microsoft demonstrated how Azure SRE Agent can ingest PagerDuty incidents and collaborate with PagerDuty’s SRE Agent to complement triage using historical patterns, runbook intelligence and Azure diagnostics. NeuBird: NeuBird’s Agentic AI SRE, Hawkeye, autonomously investigates and resolves incidents across hybrid, and multi-cloud environments. By connecting to telemetry sources like Azure Monitor, Prometheus, and GitHub, Hawkeye delivers real-time diagnosis and targeted fixes. Building on the work presented at SRE Day this partnership underscores our commitment to agentic ecosystems where specialized agents collaborate for complex scenarios. Sign up for the private preview to try the integration, here. Additionally, please check out NeuBird on Marketplace. These efforts reflect a broader vision: Azure SRE Agent as a hub for cross-platform reliability, enabling customers to manage incidents across Azure, on-premises, and other clouds with confidence. Why This Matters As organizations embrace distributed architectures, the need for integrated, intelligent, and multi-cloud-ready SRE solutions has never been greater. By partnering with industry leaders and pioneering agent-to-agent workflows, Azure SRE Agent is setting the stage for a future where resilience is not just reactive—it’s proactive and collaborative.666Views2likes0CommentsProactive Monitoring Made Simple with Azure SRE Agent
SRE teams strive for proactive operations, catching issues before they impact customers and reducing the chaos of incident response. While perfection may be elusive, the real goal is minimizing outages and gaining immediate line of sight into production environments. Today, that’s harder than ever. It requires correlating countless signals and alerts, understanding how they relate—or don’t relate—to each other, and assigning the right sense of urgency and impact. Anything that shortens this cycle, accelerates detection, and enables automated remediation is what modern SRE teams crave. What if you could skip the scripting and pipelines? What if you could simply describe what you want in plain language and let it run automatically on a schedule? Scheduled Tasks for Azure SRE Agent With Scheduled Tasks for Azure SRE Agent, that what-if scenario is now a reality. Scheduled tasks combine natural language prompts with Azure SRE Agent’s automation capabilities, so you can express intent, set a schedule, and let the agent do the rest—without writing a single line of code. This means: ⚡ Faster incident response through early detection ✅ Better compliance with automated checks 🎯 More time for high-value engineering work and innovation 💡 The shift from reactive to proactive: Instead of waiting for alerts to fire or customers to report issues, you’re continuously monitoring, validating, and catching problems before they escalate. How Scheduled Tasks Work Under the Hood When you create a Scheduled Task, the process is more than just running a prompt on a timer. Here’s what happens: 1. Prompt Interpretation and Plan Creation The SRE Agent takes your natural language prompt—such as “Scan all resources for security best practices”—and converts it into a structured execution plan. This plan defines the steps, tools, and data sources required to fulfill your request. 2. Built-In Tools and MCP Integration The agent uses its built-in capabilities (Azure CLI, Log Analytics workspace, Appinsights) and can also leverage 3 rd party data sources or tools via MCP server integration for extended functionality. 3. Results Analysis and Smart Summarization After execution, the agent analyzes results, identifies anomalies or issues, and provides actionable summaries not just raw data dumps. 4. Notification and Escalation Based on findings, the agent can: Post updates to Teams channels Create or update incidents Send email notifications Trigger follow-up actions Real-World Use Cases for Proactive Ops Here’s where scheduled tasks shine for SRE teams: Use Case Prompt Example Schedule Security Posture Check “Scan all subscriptions for resources with public endpoints and flag any that shouldn’t be exposed” Daily Cost Anomaly Detection “Compare this week’s spend against last week and alert if any service exceeds 20% growth” Weekly Compliance Drift Detection “Check all storage accounts for encryption settings and report any non-compliant resources” Daily Resource Health Summary “Summarize the health status of all production VMs and highlight any degraded instances” Every 4 hours Incident Trend Analysis “Analyze ICM incidents from the past week, identify patterns, and summarize top contributing services” Weekly Getting Started in 3 Steps Step 1: Define Your Intent Write a natural language prompt describing what you want to monitor or check. Be specific about: - What resources or scope - What conditions to look for - What action to take if issues are found Example: > “Every morning at 8 AM, check all production Kubernetes clusters for pods in CrashLoopBackOff state. If any are found, post a summary to the #sre-alerts Teams channel with cluster name, namespace, and pod details.” Step 2: Set Your Schedule Choose how often the task should run: - Cron expressions for precise control - Simple intervals (hourly, daily, weekly) Step 3: Define Where to Receive Updates Include in your prompt where you want results delivered when the task finishes execution. The agent can use its built-in tools and connectors to: - Post summaries to a Teams channel - Send email notifications - Create or update ICM incidents Example prompt with notification: > "Check all production databases for long-running queries over 30 seconds. If any are found, post a summary to the #database-alerts Teams channel." Why This Matters for Proactive Operations Traditional monitoring approaches have limitations: Traditional Approach With Scheduled Tasks Write scripts, maintain pipelines Describe in plain language Static thresholds and rules Contextual, AI-powered analysis Alert fatigue from noisy signals Smart summarization of what matters Separate tools for check vs. action Unified detection and response Requires dedicated DevOps effort Any SRE can create and modify The result? Your team spends less time building and maintaining monitoring infrastructure and more time on the work that truly requires human expertise. Best Practices for Scheduled Tasks Start simple, iterate — Begin with one or two high-value checks and expand as you gain confidence Be specific in prompts — The more context you provide, the better the results Set appropriate frequencies — Not everything needs to run hourly; match the schedule to the risk Review and refine — Check task results periodically and adjust prompts for better accuracy What’s Next? Scheduled tasks are just the beginning. We’re continuing to invest in capabilities that help SRE teams shift left—catching issues earlier, automating routine checks, and freeing up time for strategic work. Ready to Start? Use this sample that shows how to create a scheduled health check sub-agent: https://github.com/microsoft/sre-agent/blob/main/samples/automation/samples/02-scheduled-health-check-sample.md This example demonstrates: - Building a HealthCheckAgent using built-in tools like Azure CLI and Log Analytics Workspace - Scheduling daily health checks for a container app at 7 AM - Sending email alerts when anomalies are detected 🔗 Explore more samples here: https://github.com/microsoft/sre-agent/tree/main/samples More to Learn Ignite 2025 announcements: https://aka.ms/ignite25/blog/sreagent Documentation: https://aka.ms/sreagent/docs Support & Feature Requests: https://github.com/microsoft/sre-agent/issues610Views0likes0CommentsDeploying a Bun + Hono + Vite app to Azure App Service
TOC Introduction Local Environment Deployment Conclusion 1. Introduction Anthropic, the company behind Claude, recently acquired the JavaScript runtime startup Bun, marking one of the most significant shifts in the modern JavaScript ecosystem since the arrival of Node.js and Deno. This acquisition signals more than a business move, it represents a strategic consolidation of performance-oriented tooling, developer ergonomics, and the future of AI-accelerated software development. At the center of this momentum lies a powerful trio: Bun, Hono, and Vite. Bun is a next-generation JavaScript runtime that reimagines everything from dependency installation to HTTP servers, bundling, and execution speed. On top of Bun, frameworks like Hono provide an elegant, lightweight approach to building APIs and full web applications. Hono embraces the Web Standard API model while optimizing for speed and minimal footprint. For front-end development, Vite completes the trio. Vite provides lightning-fast local development through native ES modules and an optimized build pipeline. When paired with Bun, the developer experience becomes even smoother, as Bun accelerates not only the dev server but also the entire build process. The result is a full-stack workflow where front-end and back-end both benefit from a consistent, high-performance environment. This article will guide you through deploying a Bun + Hono + Vite application to an Azure Linux Web App. 2. Local Environment The development environment used in this example is a Docker container. All project creation, modification, and testing will take place inside this environment. We will build an application using Bun + Hono + Vite, containing three endpoints: / : The root endpoint, which displays Hello Bun Hono Vite /api/hello : A static endpoint /api/backend : A runtime endpoint executed on the server, returning computed results for the frontend to display Create the Docker development environment This command generates a Bun + Hono + Vite project. docker run --rm -it -v "$PWD":/app -w /app oven/bun:latest bunx create-vite . --template vanilla-ts Follow the prompts as shown in the image and make the appropriate selections. After completing the prompts, the corresponding project files will be created. At this point, you may press Ctrl + C to stop the running Bun server. Next, install Hono: docker run --rm -it -v "$PWD":/app -w /app oven/bun:latest bun add hono Create 4 files and edit 2 files Create .vscode/settings.json This file serves two purposes: To prevent the large node_modules folder from being uploaded during deployment. Uploading it wastes bandwidth and may cause compatibility issues between local and production environments. To disable ORYX BUILD from interfering in the deployment process. We will use a custom startup script to handle all build tasks instead. { "appService.zipIgnorePattern": [ "node_modules{,/**}", ".git{,/**}", ".vscode{,/**}" ], "appService.showBuildDuringDeployPrompt": false } Create vite.config.ts This file configures Vite’s base path to use relative paths instead of absolute paths. Although this does not affect the production environment, it is essential locally when URLs and ports may differ. import { defineConfig } from 'vite'; export default defineConfig({ base: './', }); Create server.ts This file configures backend routing using Hono and sets up the Bun server. It includes several test endpoints, some static, and some executed at runtime. import { Hono } from 'hono'; import { serveStatic } from 'hono/bun'; const app = new Hono(); app.get('/api/hello', (c) => { return c.text('this is api/hello'); }); app.get('/api/backend', (c) => { const result = 1 + 1; return c.json({ message: "this is /api/backend", calc: `1 + 1 = ${result}`, value: result, }); }); app.use('/assets/*', serveStatic({ root: './dist' })); app.use('/*', serveStatic({ root: './dist' })); app.get('/', serveStatic({ path: './dist/index.html' })); const port = Number(process.env.PORT ?? 3000); export default { port, fetch: app.fetch, }; Create startup.sh This script serves several important roles: Remove any node_modules folders or tar archives created by ORYX during deployment Install the Bun runtime Fully take over the ORYX build process Start the Bun server as the final step #!/bin/bash set -e echo "===== Startup script running =====" cd /home/site/wwwroot echo "Cleaning up Oryx-generated node_modules..." if [ -d /node_modules ]; then echo "Removing /node_modules ..." rm -rf /node_modules fi if [ -f node_modules.tar.gz ]; then echo "Removing node_modules.tar.gz ..." rm -f node_modules.tar.gz fi echo "Oryx cleanup complete." export BUN_INSTALL=/home/site/wwwroot/.bun export PATH="$BUN_INSTALL/bin:/home/site/wwwroot/node_modules/.bin:$PATH" export NODE_PATH="/home/site/wwwroot/node_modules" if [ ! -f "$BUN_INSTALL/bin/bun" ]; then echo "Bun not found. Installing..." curl -fsSL https://bun.sh/install | bash else echo "Bun already installed at $BUN_INSTALL" fi echo "Using Bun version:" bun --version echo "Running bun install ..." bun install echo "Running bun run build ..." bun run build echo "Starting server with bun run start ..." bun run start Modify src/main.ts Simplify the default welcome page so it only displays our test text. // src/main.ts document.querySelector<HTMLDivElement>('#app')!.innerHTML = ` <h1>Hello Bun Hono Vite</h1> `; Modify package.json Update the scripts section so that the project uses the newly created server.ts as the server entry point. { "name": "app", "private": true, "version": "0.0.0", "type": "module", "scripts": { "dev:vite": "vite", "build": "vite build", "preview": "vite preview", "start": "bun server.ts" }, "devDependencies": { "typescript": "~5.9.3", "vite": "^7.2.4" }, "dependencies": { "hono": "^4.10.7" } } Build the project locally This command generates a dist directory containing all Vite-built static assets. docker run --rm -it -v "$PWD":/app -w /app oven/bun:latest bun run build Run the server locally for testing docker run --rm -it -v "$PWD":/app -w /app -p 3000:3000 -e PORT=3000 oven/bun:latest bun run start You can press Ctrl + C to stop the server when finished. http://127.0.0.1:3000/ http://127.0.0.1:3000/api/hello http://127.0.0.1:3000/api/backend 3. Deployment We create a Linux Web App with a minimum SKU of Premium. Add Environment Variables SCM_DO_BUILD_DURING_DEPLOYMENT=false Purpose: Prevents the deployment environment from packaging during publish. This must also be set in the deployment environment itself. WEBSITE_RUN_FROM_PACKAGE=false Purpose: Instructs Azure Web App not to run the app from a prepackaged file. ENABLE_ORYX_BUILD=false Purpose: Prevents Azure Web App from building after deployment. All build tasks will instead execute during the startup script. Add Startup Command bash /home/site/wwwroot/startup.sh After that, we can return to VS Code and deploy the project. Once the deployment is complete, wait about five minutes for the build process to finish, and then you can begin testing. / /api/hello /api/backend 4. Conclusion Bun + Hono + Vite form a cohesive ecosystem that embodies the next era of JavaScript development: fast, compact, ergonomic, and deeply aligned with modern infrastructure needs. It is particularly well-suited for AI applications, where latency, concurrency, and rapid iteration matter more than ever. From streaming inference endpoints to vector database integrations, this stack offers the responsiveness and scalability essential for AI-powered systems.428Views0likes0CommentsCommon Misconceptions When Running Locally vs. Deploying to Azure Linux-based Web Apps
TOC Introduction Environment Variable Build Time Compatible Memory Conclusion 1. Introduction One of the most common issues during project development is the scenario where “the application runs perfectly in the local environment but fails after being deployed to Azure.” In most cases, deployment logs will clearly reveal the problem and allow you to fix it quickly. However, there are also more complicated situations where "due to the nature of the error itself" relevant logs may be difficult to locate. This article introduces several common categories of such problems and explains how to troubleshoot them. We will demonstrate them using Python and popular AI-related packages, as these tend to exhibit compatibility-related behavior. Before you begin, it is recommended that you read Deployment and Build from Azure Linux based Web App | Microsoft Community Hub on how Azure Linux-based Web Apps perform deployments so you have a basic understanding of the build process. 2. Environment Variable Simulating a Local Flask + sklearn Project First, let’s simulate a minimal Flask + sklearn project in any local environment (VS Code in this example). For simplicity, the sample code does not actually use any sklearn functions; it only displays plain text. app.py from flask import Flask app = Flask(__name__) @app.route("/") def index(): return "hello deploy environment variable" if __name__ == "__main__": app.run(host="0.0.0.0", port=8000) We also preset the environment variables required during Azure deployment, although these will not be used when running locally. .deployment [config] SCM_DO_BUILD_DURING_DEPLOYMENT=false As you may know, the old package name sklearn has long been deprecated in favor of scikit-learn. However, for the purpose of simulating a compatibility error, we will intentionally specify the outdated package name. requirements.txt Flask==3.1.0 gunicorn==23.0.0 sklearn After running the project locally, you can open a browser and navigate to the target URL to verify the result. python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt python app.py Of course, you may encounter the same compatibility issue even in your local environment. Simply running the following command resolves it: export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True We will revisit this error and its solution shortly. For now, create a Linux Web App running Python 3.12 and configure the following environment variables. We will define Oryx Build as the deployment method. SCM_DO_BUILD_DURING_DEPLOYMENT=false WEBSITE_RUN_FROM_PACKAGE=false ENABLE_ORYX_BUILD=true After deploying the code and checking the Deployment Center, you should see an error similar to the following. From the detailed error message, the cause is clear: sklearn is deprecated and replaced by scikit-learn, so additional compatibility handling is now required by the Python runtime. The error message suggests the following solutions: Install the newer scikit-learn package directly. If your project is deeply coupled to the old sklearn package and cannot be refactored yet, enable compatibility by setting an environment variable to allow installation of the deprecated package. Typically, this type of “works locally but fails on Azure” behavior happens because the deprecated package was installed in the local environment a long time ago at the start of the project, and everything has been running smoothly since. Package compatibility issues like this are very common across various languages on Linux. When a project becomes tightly coupled to an outdated package, you may not be able to upgrade it immediately. In these cases, compatibility workarounds are often the only practical short-term solution. In our example, we will add the environment variable: SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True However, here comes the real problem: This variable is needed during the build phase, but the environment variables set in Azure Portal’s Application Settings only take effect at runtime. So what should we do? The answer is simple, shift the Oryx Build process from build-time to runtime. First, open Azure Portal → Configuration and disable Oryx Build. ENABLE_ORYX_BUILD=false Next, modify the project by adding a startup script. run.sh #!/bin/bash export SKLEARN_ALLOW_DEPRECATED_SKLEARN_PACKAGE_INSTALL=True python -m venv .venv source .venv/bin/activate pip install -r requirements.txt python app.py The startup script works just like the commands you run locally before executing the application. The difference is that you can inject the necessary compatibility environment variables before running pip install or starting the app. After that, return to Azure Portal and add the following Startup Command under Stack Settings. This ensures that your compatibility environment variables and build steps run before the runtime starts. bash run.sh Your overall project structure will now look like this. Once redeployed, everything should work correctly. 3. Build Time Build-Time Errors Caused by AI-Related Packages Many build-time failures are caused by AI-related packages, whose installation processes can be extremely time-consuming. You can investigate these issues by reviewing the deployment logs at the following maintenance URL: https://<YOUR_APP_NAME>.scm.azurewebsites.net/newui Compatible Let’s simulate a Flask + numpy project. The code is shown below. app.py from flask import Flask app = Flask(__name__) @app.route("/") def index(): return "hello deploy compatible" if __name__ == "__main__": app.run(host="0.0.0.0", port=8000) We reuse the same environment variables from the sklearn example. .deployment [config] SCM_DO_BUILD_DURING_DEPLOYMENT=false This time, we simulate the incompatibility between numpy==1.21.0 and Python 3.10. requirements.txt Flask==3.1.0 gunicorn==23.0.0 numpy==1.21.0 We will skip the local execution part and move directly to creating a Linux Web App running Python 3.10. Configure the same environment variables as before, and define the deployment method as runtime build. SCM_DO_BUILD_DURING_DEPLOYMENT=false WEBSITE_RUN_FROM_PACKAGE=false ENABLE_ORYX_BUILD=false After deployment, Deployment Center shows a successful publish. However, the actual website displays an error. At this point, you must check the deployment log files mentioned earlier. You will find two key logs: 1. docker.log Displays real-time logs of the platform creating and starting the container. In this case, you will see that the health probe exceeded the default 230-second startup window, causing container startup failure. This tells us the root cause is container startup timeout. To determine why it timed out, we must inspect the second file. 2. default_docker.log Contains the internal execution logs of the container. Not generated in real time, usually delayed around 15 minutes. Therefore, if docker.log shows a timeout error, wait at least 15 minutes to allow the logs to be written here. In this example, the internal log shows that numpy was being compiled during pip install, and the compilation step took too long. We now have a concrete diagnosis: numpy 1.21.0 is not compatible with Python 3.10, which forces pip to compile from source. The compilation exceeds the platform’s startup time limit (230 seconds) and causes the container to fail. We can verify this by checking numpy’s official site: numpy · PyPI numpy 1.21.0 only provides wheels for cp37, cp38, cp39 but not cp310 (which is python 3.10). Thus, compilation becomes unavoidable. Possible Solutions Set the environment variable WEBSITES_CONTAINER_START_TIME_LIMIT to increase the allowed container startup time. Downgrade Python to 3.9 or earlier. Upgrade numpy to 1.21.0+, where suitable wheels for Python 3.10 are available. In this example, we choose this option. After upgrading numpy to version 1.25.0 (which supports Python 3.10) from specifying in requirements.txt and redeploying, the issue is resolved. numpy · PyPI requirements.txt Flask==3.1.0 gunicorn==23.0.0 numpy==1.25.0 Memory The final example concerns the App Service SKU. AI packages such as Streamlit, PyTorch, and others require significant memory. Any one of these packages may cause the build process to fail due to insufficient memory. The error messages vary widely each time. If you repeatedly encounter unexplained build failures, check Deployment Center or default_docker.log for Exit Code 137, which indicates that the system ran out of memory during the build. The only solution in such cases is to scale up. 4. Conclusion This article introduced several common troubleshooting techniques for resolving Linux Web App issues caused during the build stage. Most of these problems relate to package compatibility, although the symptoms may vary greatly. By understanding the debugging process demonstrated in these examples, you will be better prepared to diagnose and resolve similar issues in future deployments.470Views0likes1CommentUnlocking Client-Side Configuration at Scale with Azure App Configuration and Azure Front Door
As modern apps shift more logic to the browser, Azure App Configuration now brings dynamic configuration directly to client-side applications. Through its integration with Azure Front Door, developers can deliver configuration to thousands or millions of clients with CDN-scale performance while avoiding the need to expose secrets or maintain custom proxy layers. This capability is especially important for AI-powered and agentic client applications, where model settings and behaviors often need to adjust rapidly and safely. This post introduces the new capability, what it unlocks for developers, and how to start building dynamic, configuration-driven client experiences in Azure. App Configuration for Client Applications Centralized Settings and Feature Management App Configuration gives developers a single, consistent place to define configuration settings and feature flags. Until now, this capability was used almost exclusively by server-side applications. With Azure Front Door integration, these same settings can now power modern client experiences across: Single Page Applications (React, Vue, Angular, Next.js, and others using JavaScript) Mobile/ and desktop applications with .Net MAUI JavaScript-powered UI components or embedded widgets running in browser Any browser-based application that can run JavaScript This allows developers to update configuration without redeploying the client app. CDN-Accelerated Configuration Delivery with Azure Front Door Azure Front Door enables client applications to fetch configuration using a fast, globally distributed CDN path. Developers benefit from: High-scale configuration delivery to large client populations Edge caching for fast, low-latency configuration retrieval Reduced load on your backend configuration store through CDN offloading Dedicated endpoint that exposes only the configuration subset it is scoped for. Secure and Scalable Architecture App Configuration integrates with Azure Front Door to deliver configuration to client-side apps using a simple, secure, and CDN-accelerated flow. How it works The browser calls Azure Front Door anonymously, like any CDN asset. Front Door uses managed identity to access App Configuration securely. Only selected key-values, feature flags or snapshots are exposed through Azure Front Door. No secrets or credentials are shipped to the client. Edge caching enables high throughput and low latency configuration delivery. This provides a secure and efficient design for client applications and eliminates the need for custom gateway code or proxy services. Developer Scenarios: What You Can Build CDN-delivered configuration unlocks a range of rich client application scenarios: Client-side feature rollouts for UI components A/B testing or targeted experiences using feature flags Control AI/LLM model parameters and UI behaviors through configuration Dynamically control client-side agent behavior, safety modes, and guardrail settings through configuration Consistent behavior for clients using snapshot-based configuration references These scenarios previously required custom proxies. Now, they work out-of-the-box with Azure App Configuration + Azure Front Door. End-to-End Developer Journey The workflow for enabling client-side configuration with App Configuration is simple: Define key values or feature flags in Azure App Configuration Connect App Configuration to Azure Front Door in the portal Scope configuration exposed by Front Door endpoint with key value or snapshot filter. Use the updated AppConfig JavaScript or .NET provider to connect to Front Door anonymously. Client app fetches configuration via Front Door with CDN performance Update your configuration centrally, no redeployment required To see this workflow end-to-end, check out this demo video. The video shows how to connect an App Configuration store to Azure Front Door and use the Front Door endpoint in a client application. It also demonstrates dynamic feature flag refresh as updates are made in the store. Portal Experience to connect Front Door Once you create your App Configuration store with key values and/or feature flags, you can configure the Front Door connection directly in the Azure portal. The App Configuration portal guides you through connecting a profile, creating an endpoint, and scoping which keys, labels, or snapshots will be exposed to client applications. A detailed “How-To” guide is available in the App Configuration documentation. Using the Front Door Endpoint in Client Applications JavaScript Provider Minimum version for this feature is 2.3.0-preview, get the provider from here. Add below snippet in your code to fetch the key values and/or feature flags from App Configuration through front door. import { loadFromAzureFrontDoor } from "@azure/app-configuration-provider"; const appConfig = await loadFromAzureFrontDoor("https://<your-afd-endpoint>", { featureFlagOptions: { enabled: true }, }); const yoursetting = appConfig.get("<app.yoursetting>"); .NET Provider Minimum version supporting this feature is 8.5.0-preview, get the provider from here builder.Configuration.AddAzureAppConfiguration(options => { options.ConnectAzureFrontDoor(new Uri("https://<your-afd-endpoint>")) .UseFeatureFlags(featureFlagOptions => { featureFlagOptions.Select("<yourappprefix>"); }); }); See our GitHub samples for JavaScript and .NET MAUI for complete client application setups. Notes & Limitations Feature flag scoping requires two key prefix filters, startsWith(".appconfig.featureflag") and ALL keys. Portal Telemetry feature does not reflect client-side consumption yet. This feature is in preview, and currently not supported in Azure sovereign clouds. Conclusion By combining Azure App Configuration with Azure Front Door, developers can now power a new generation of dynamic client applications. Configuration is delivered at CDN speed, securely and at scale letting you update experiences instantly, without redeployment or secret management on client side. This integration brings App Configuration’s flexibility directly to the browser, making it easier to power AI-driven interfaces, agentic workflows, and dynamic user experiences. Try client-side configuration with App Configuration today and update your apps’ behavior in real time, without any redeployments.477Views2likes0CommentsAzure Container Registry Repository Permissions with Attribute-based Access Control (ABAC)
General Availability announcement Today marks the general availability of Azure Container Registry (ACR) repository permissions with Microsoft Entra attribute-based access control (ABAC). ABAC augments the familiar Azure RBAC model with namespace and repository-level conditions so platform teams can express least-privilege access at the granularity of specific repositories or entire logical namespaces. This capability is designed for modern multi-tenant platform engineering patterns where a central registry serves many business domains. With ABAC, CI/CD systems and runtime consumers like Azure Kubernetes Service (AKS) clusters have least-privilege access to ACR registries. Why this matters Enterprises are converging on a central container registry pattern that hosts artifacts and container images for multiple business units and application domains. In this model: CI/CD pipelines from different parts of the business push container images and artifacts only to approved namespaces and repositories within a central registry. AKS clusters, Azure Container Apps (ACA), Azure Container Instances (ACI), and consumers pull only from authorized repositories within a central registry. With ABAC, these repository and namespace permission boundaries become explicit and enforceable using standard Microsoft Entra identities and role assignments. This aligns with cloud-native zero trust, supply chain hardening, and least-privilege permissions. What ABAC in ACR means ACR registries now support a registry permissions mode called “RBAC Registry + ABAC Repository Permissions.” Configuring a registry to this mode makes it ABAC-enabled. When a registry is configured to be ABAC-enabled, registry administrators can optionally add ABAC conditions during standard Azure RBAC role assignments. This optional ABAC conditions scope the role assignment’s effect to specific repositories or namespace prefixes. ABAC can be enabled on all new and existing ACR registries across all SKUs, either during registry creation or configured on existing registries. ABAC-enabled built-in roles Once a registry is ABAC-enabled (configured to “RBAC Registry + ABAC Repository Permissions), registry admins can use these ABAC-enabled built-in roles to grant repository-scoped permissions: Container Registry Repository Reader: grants image pull and metadata read permissions, including tag resolution and referrer discoverability. Container Registry Repository Writer: grants Repository Reader permissions, as well as image and tag push permissions. Container Registry Repository Contributor: grants Repository Reader and Writer permissions, as well as image and tag delete permissions. Note that these roles do not grant repository list permissions. The separate Container Registry Repository Catalog Lister must be assigned to grant repository list permissions. The Container Registry Repository Catalog Lister role does not support ABAC conditions; assigning it grants permissions to list all repositories in a registry. Important role behavior changes in ABAC mode When a registry is ABAC-enabled by configuring its permissions mode to “RBAC Registry + ABAC Repository Permissions”: Legacy data-plane roles such as AcrPull, AcrPush, AcrDelete are not honored in ABAC-enabled registries. For ABAC-enabled registries, use the ABAC-enabled built-in roles listed above. Broad roles like Owner, Contributor, and Reader previously granted full control plane and data plane permissions, which is typically an overprivileged role assignment. In ABAC-enabled registries, these broad roles will only grant control plane permissions to the registry. They will no longer grant data plane permissions, such as image push, pull, delete or repository list permissions. ACR Tasks, Quick Tasks, Quick Builds, and Quick Runs no longer inherit default data-plane access to source registries; assign the ABAC-enabled roles above to the calling identity as needed. Identities you can assign ACR ABAC uses standard Microsoft Entra role assignments. Assign RBAC roles with optional ABAC conditions to users, groups, service principals, and managed identities, including AKS kubelet and workload identities, ACA and ACI identities, and more. Next steps Start using ABAC repository permissions in ACR to enforce least-privilege artifact push, pull, and delete boundaries across your CI/CD systems and container image workloads. This model is now the recommended approach for multi-tenant platform engineering patterns and central registry deployments. To get started, follow the step-by-step guides in the official ACR ABAC documentation: https://aka.ms/acr/auth/abac1.1KViews1like0Comments