best practices
83 TopicsWho Created This Azure Resource? Here's How to Find Out
One of the most common questions Azure customers and administrators ask is: “How do I know who created this resource?” If you’ve ever been in charge of managing a large subscription with dozens (or even thousands) of resources, you know how important it is to answer this question quickly. Whether it’s for troubleshooting, governance, or compliance, tracking the origin of a resource can save time and reduce confusion. The good news: Azure makes this information available. You just need to know where to look. Step 1: Open the Resource Overview Navigate to the Overview page of the resource in question. This gives you the usual metadata like resource group, subscription, location, login server, and provisioning state. At first glance, however, you won’t see who created the resource. That information isn’t shown in the overview fields. Step 2: Switch to JSON View On the Overview page, look for the link labeled “JSON View” in the top right corner. Clicking this opens the full resource definition in JSON format. Step 3: Scroll to the systemData Section Within the JSON, scroll until you find the systemData object. This is where Azure tracks metadata about the resource lifecycle. Here’s what you’ll find: "systemData": { "createdBy": "someuser@domain.com", "createdByType": "User", "createdAt": "2025–05–20T19:50:33.1511397Z", "lastModifiedBy": "someuser@domain.com", "lastModifiedByType": "User", "lastModifiedAt": "2025–05–20T19:50:33.1511397Z" } What This Tells You createdBy → The user or service principal that created the resource. createdByType → Whether it was created by a human user, managed identity, or another Azure service. createdAt → The exact timestamp of creation (UTC). lastModifiedBy, lastModifiedByType, and lastModifiedAt → Useful if the resource was updated after creation. This metadata gives you clear visibility into who provisioned the resource and when. Why It Matters Governance — Understand ownership and responsibility. Troubleshooting — Track down configuration changes. Compliance & Auditing — Satisfy requirements for accountability in your cloud environment. By making the systemData object part of your standard investigation checklist, you’ll save yourself the guesswork the next time you’re wondering, “Who created this resource?”1.5KViews3likes7CommentsSearch Less, Build More: Inner Sourcing with GitHub CoPilot and ADO MCP Server
Developers burn cycles context‑switching: opening five repos to find a logging example, searching a wiki for a data masking rule, scrolling chat history for the latest pipeline pattern. Organisations that I speak to are often on the path of transformational platform engineering projects but always have the fear or doubt of "what if my engineers don't use these resources". While projects like Backstage still play a pivotal role in inner sourcing and discoverability I also empathise with developers who would argue "How would I even know in the first place, which modules have or haven't been created for reuse". In this blog we explore how we can ensure organisational standards and developer satisfaction without any heavy lifting on either side, no custom model training, no rewriting or relocating of repositories and no stagnant local data. Using GitHub CoPilot + Azure DevOps MCP server (with the free `code_search` extension) we turn the IDE into an organizational knowledge interface. Instead of guessing or re‑implementing, engineers can start scaffolding projects or solving issues as they would normally (hopefully using CoPilot) and without extra prompting. GitHub CoPilot can lean into organisational standards and ensure recommendations are made with code snippets directly generated from existing examples. What Is the Azure DevOps MCP Server + code_search Extension? MCP (Model Context Protocol) is an open standard that lets agents (like GitHub Copilot) pull in structured, on-demand context from external systems. MCP servers contain natural language explanations of the tools that the agent can utilise allowing dynamic decision making of when to implement certain toolsets over others. The Azure DevOps MCP Server is the ADO Product Team's implementation of that standard. It exposes your ADO environment in a way CoPilot can consume. Out of the box it gives you access to: Projects – list and navigate across projects in your organization. Repositories – browse repos, branches, and files. Work items – surface user stories, bugs, or acceptance criteria. Wiki's – pull policies, standards, and documentation. This means CoPilot can ground its answers in live ADO content, instead of hallucinating or relying only on what’s in the current editor window. The ADO server runs locally from your own machine to ensure that all sensitive project information remains within your secure network boundary. This also means that existing permissions on ADO objects such as Projects or Repositories are respected. Wiki search tooling available out of the box with ADO MCP server is very useful however if I am honest I have seen these wiki's go unused with documentation being stored elsewhere either inside the repository or in a project management tool. This means any tool that needs to implement code requires the ability to accurately search the code stored in the repositories themself. That is where the code_search extension enablement in ADO is so important. Most organisations have this enabled already however it is worth noting that this pre-requisite is the real unlock of cross-repo search. This allows for CoPilot to: Query for symbols, snippets, or keywords across all repos. Retrieve usage examples from code, not just docs. Locate standards (like logging wrappers or retry policies) wherever they live. Back every recommendation with specific source lines. In short: MCP connects CoPilot to Azure DevOps. code_search makes that connection powerful by turning it into a discovery engine. What is the relevance of CoPilot Instructions? One of the less obvious but most powerful features of GitHub CoPilot is its ability to follow instructions files. CoPilot automatically looks for these files and uses them as a “playbook” for how it should behave. There are different types of instructions you can provide: Organisational instructions – apply across your entire workspace, regardless of which repo you’re in. Repo-specific instructions – scoped to a particular repository, useful when one project has unique standards or patterns. Personal instructions – smaller overrides layered on top of global rules when a local exception applies. (Stored in .github/copilot-instructions.md) In this solution, I’m using a single personal instructions file. It tells CoPilot: When to search (e.g., always query repos and wikis before answering a standards question). Where to look (Azure DevOps repos, wikis, and with code_search, the code itself). How to answer (responses must cite the repo/file/line or wiki page; if no source is found, say so). How to resolve conflicts (prefer dated wiki entries over older README fragments). As a small example, a section of a CoPilot instruction file could look like this: # GitHub Copilot Instructions for Azure DevOps MCP Integration This project uses Azure DevOps with MCP server integration to provide organizational context awareness. Always check to see if the Azure DevOps MCP server has a tool relevant to the user's request. ## Core Principles ### 1. Azure DevOps Integration - **Always prioritize Azure DevOps MCP tools** when users ask about: - Work items, stories, bugs, tasks - Pull requests and code reviews - Build pipelines and deployments - Repository operations and branch management - Wiki pages and documentation - Test plans and test cases - Project and team information ### 2. Organizational Context Awareness - Before suggesting solutions, **check existing organizational patterns** by: - Searching code across repositories for similar implementations - Referencing established coding standards and frameworks - Looking for existing shared libraries and utilities - Checking architectural decision records (ADRs) in wikis ### 3. Cross-Repository Intelligence - When providing code suggestions: - **Search for existing patterns** in other repositories first - **Reference shared libraries** and common utilities - **Maintain consistency** with organizational standards - **Suggest reusable components** when appropriate ## Tool Usage Guidelines ### Work Items and Project Management When users mention bugs, features, tasks, or project planning: ``` ✅ Use: wit_my_work_items, wit_create_work_item, wit_update_work_item ✅ Use: wit_list_backlogs, wit_get_work_items_for_iteration ✅ Use: work_list_team_iterations, core_list_projects The result... To test this I created 3 ADO Projects each with between 1-2 repositories. The repositories were light with only ReadMe's inside containing descriptions of the "repo" and some code snippets examples for usage. I have then created a brand-new workspace with no context apart from a CoPilot instructions document (which could be part of a repo scaffold or organisation wide) which tells CoPilot to search code and the wikis across all ADO projects in my demo environment. It returns guidance and standards from all available repo's and starts to use it to formulate its response. In the screenshot I have highlighted some key parts with red boxes. The first being a section of the readme that CoPilot has identified in its response, that part also highlighted within CoPilot chat response. I have highlighted the rather generic prompt I used to get this response at the bottom of that window too. Above I have highlighted CoPilot using the MCP server tooling searching through projects, repo's and code. Finally the largest box highlights the instructions given to CoPilot on how to search and how easily these could be optimised or changed depending on the requirements and organisational coding standards. How did I implement this? Implementation is actually incredibly simple. As mentioned I created multiple projects and repositories within my ADO Organisation in order to test cross-project & cross-repo discovery. I then did the following: Enable code_search - in your Azure DevOps organization (Marketplace → install extension). Login to Azure - Use the AZ CLI to authenticate to Azure with "az login". Create vscode/mcp.json file - Snippet is provided below, the organisation name should be changed to your organisations name. Start and enable your MCP server - In the mcp.json file you should see a "Start" button. Using the snippet below you will be prompted to add your organisation name. Ensure your CoPilot agent has access to the server under "tools" too. View this setup guide for full setup instructions (azure-devops-mcp/docs/GETTINGSTARTED.md at main · microsoft/azure-devops-mcp) Create a CoPilot Instructions file - with a search-first directive. I have inserted the full version used in this demo at the bottom of the article. Experiment with Prompts – Start generic (“How do we secure APIs?”). Review the output and tools used and then tailor your instructions. Considerations While this is a great approach I do still have some considerations when going to production. Latency - Using MCP tooling on every request will add some latency to developer requests. We can look at optimizing usage through copilot instructions to better identify when CoPilot should or shouldn't use the ADO MCP server. Complex Projects and Repositories - While I have demonstrated cross project and cross repository retrieval my demo environment does not truly simulate an enterprise ADO environment. Performance should be tested and closely monitored as organisational complexity increases. Public Preview - The ADO MCP server is moving quickly but is currently still public preview. We have demonstrated in this article how quickly we can make our Azure DevOps content discoverable. While their are considerations moving forward this showcases a direction towards agentic inner sourcing. Feel free to comment below how you think this approach could be extended or augmented for other use cases! Resources MCP Server Config (/.vscode/mcp.json) { "inputs": [ { "id": "ado_org", "type": "promptString", "description": "Azure DevOps organization name (e.g. 'contoso')" } ], "servers": { "ado": { "type": "stdio", "command": "npx", "args": ["-y", "@azure-devops/mcp", "${input:ado_org}"] } } } CoPilot Instructions (/.github/copilot-instructions.md) # GitHub Copilot Instructions for Azure DevOps MCP Integration This project uses Azure DevOps with MCP server integration to provide organizational context awareness. Always check to see if the Azure DevOps MCP server has a tool relevant to the user's request. ## Core Principles ### 1. Azure DevOps Integration - **Always prioritize Azure DevOps MCP tools** when users ask about: - Work items, stories, bugs, tasks - Pull requests and code reviews - Build pipelines and deployments - Repository operations and branch management - Wiki pages and documentation - Test plans and test cases - Project and team information ### 2. Organizational Context Awareness - Before suggesting solutions, **check existing organizational patterns** by: - Searching code across repositories for similar implementations - Referencing established coding standards and frameworks - Looking for existing shared libraries and utilities - Checking architectural decision records (ADRs) in wikis ### 3. Cross-Repository Intelligence - When providing code suggestions: - **Search for existing patterns** in other repositories first - **Reference shared libraries** and common utilities - **Maintain consistency** with organizational standards - **Suggest reusable components** when appropriate ## Tool Usage Guidelines ### Work Items and Project Management When users mention bugs, features, tasks, or project planning: ``` ✅ Use: wit_my_work_items, wit_create_work_item, wit_update_work_item ✅ Use: wit_list_backlogs, wit_get_work_items_for_iteration ✅ Use: work_list_team_iterations, core_list_projects ``` ### Code and Repository Operations When users ask about code, branches, or pull requests: ``` ✅ Use: repo_list_repos_by_project, repo_list_pull_requests_by_repo ✅ Use: repo_list_branches_by_repo, repo_search_commits ✅ Use: search_code for finding patterns across repositories ``` ### Documentation and Knowledge Sharing When users need documentation or want to create/update docs: ``` ✅ Use: wiki_list_wikis, wiki_get_page_content, wiki_create_or_update_page ✅ Use: search_wiki for finding existing documentation ``` ### Build and Deployment When users ask about builds, deployments, or CI/CD: ``` ✅ Use: pipelines_get_builds, pipelines_get_build_definitions ✅ Use: pipelines_run_pipeline, pipelines_get_build_status ``` ## Response Patterns ### 1. Discovery First Before providing solutions, always discover organizational context: ``` "Let me first check what patterns exist in your organization..." → Search code, check repositories, review existing work items ``` ### 2. Reference Organizational Standards When suggesting code or approaches: ``` "Based on patterns I found in your [RepositoryName] repository..." "Following your organization's standard approach seen in..." "This aligns with the pattern established in [TeamName]'s implementation..." ``` ### 3. Actionable Integration Always offer to create or update Azure DevOps artifacts: ``` "I can create a work item for this enhancement..." "Should I update the wiki page with this new pattern?" "Let me link this to the current iteration..." ``` ## Specific Scenarios ### New Feature Development 1. **Search existing repositories** for similar features 2. **Check architectural patterns** and shared libraries 3. **Review related work items** and planning documents 4. **Suggest implementation** based on organizational standards 5. **Offer to create work items** and documentation ### Bug Investigation 1. **Search for similar issues** across repositories and work items 2. **Check related builds** and recent changes 3. **Review test results** and failure patterns 4. **Provide solution** based on organizational practices 5. **Offer to create/update** bug work items and documentation ### Code Review and Standards 1. **Compare against organizational patterns** found in other repositories 2. **Reference coding standards** from wiki documentation 3. **Suggest improvements** based on established practices 4. **Check for reusable components** that could be leveraged ### Documentation Requests 1. **Search existing wikis** for related content 2. **Check for ADRs** and technical documentation 3. **Reference patterns** from similar projects 4. **Offer to create/update** wiki pages with findings ## Error Handling If Azure DevOps MCP tools are not available or fail: 1. **Inform the user** about the limitation 2. **Provide alternative approaches** using available information 3. **Suggest manual steps** for Azure DevOps integration 4. **Offer to help** with configuration if needed ## Best Practices ### Always Do: - ✅ Search organizational context before suggesting solutions - ✅ Reference existing patterns and standards - ✅ Offer to create/update Azure DevOps artifacts - ✅ Maintain consistency with organizational practices - ✅ Provide actionable next steps ### Never Do: - ❌ Suggest solutions without checking organizational context - ❌ Ignore existing patterns and implementations - ❌ Provide generic advice when specific organizational context is available - ❌ Forget to offer Azure DevOps integration opportunities --- **Remember: The goal is to provide intelligent, context-aware assistance that leverages the full organizational knowledge base available through Azure DevOps while maintaining development efficiency and consistency.**750Views1like3CommentsLet's move away from API keys!
35% of all exposed API keys are still active That's a pretty scary statement, that means services are being misused most likely, data is actively being stolen and so on. Let's try to understand why the concept of API keys is a problematic one and how we can move away from it.4.4KViews2likes1CommentAnnouncing the General Availability of New Availability Zone Features for Azure App Service
What are Availability Zones? Availability Zones, or zone redundancy, refers to the deployment of applications across multiple availability zones within an Azure region. Each availability zone consists of one or more data centers with independent power, cooling, and networking. By leveraging zone redundancy, you can protect your applications and data from data center failures, ensuring uninterrupted service. Key Updates The minimum instance requirement for enabling Availability Zones has been reduced from three instances to two, while still maintaining a 99.99% SLA. Many existing App Service plans with two or more instances will automatically support Availability Zones without additional setup. The zone redundant setting for App Service plans and App Service Environment v3 is now mutable throughout the life of the resources. Enhanced visibility into Availability Zone information, including physical zone placement and zone counts, is now provided. For App Service Environment v3, the minimum instance fee for enabling Availability Zones has been removed, aligning the pricing model with the multi-tenant App Service offering. The minimum instance requirement for enabling Availability Zones has been reduced from three instances to two. You can now enjoy the benefits of Availability Zones with just two instances since we continue to uphold a 99.99% SLA even with the two-instance configuration. Many existing App Service plans with two or more instances will automatically support Availability Zones without necessitating additional setup. Over the past few years, efforts have been made to ensure that the App Service footprint supports Availability Zones wherever possible, and we’ve made significant gains in doing so. Therefore, many existing customers can enable Availability Zones on their current deployments without needing to redeploy. Along with supporting 2-instance Availability Zone configuration, we have enabled Availability Zones on the App Service footprint in regions where only two zones may be available. Previously, enabling Availability Zones required a region to have three zones with sufficient capacity. To account for the growing demand, we now support Availability Zone deployments in regions with just two zones. This allows us to provide you with Availability Zone features across more regions. And with that, we are upholding the 99.99% SLA even with the 2-zone configuration. Additionally, we are pleased to announce that the zone redundant setting (zoneRedundant property) for App Service plans and App Service Environment v3 is now mutable throughout the life of these resources. This enhancement allows customers on Premium V2, Premium V3, or Isolated V2 plans to toggle zone redundancy on or off as required. With this capability, you can reduce costs and scale to a single instance when multiple instances are not necessary. Conversely, you can scale out and enable zone redundancy at any time to meet your requirements. This ability has been requested for a while now and we are excited to finally make it available. For App Service Environment v3 users, this also means that your individual App Service plan zone redundancy status is now independent of other plans in your App Service Environment. This means that you can have a mix of zone redundant and non-zone redundant plans in an App Service Environment, something that was previously not supported. In addition to these new features, we also have a couple of other exciting things to share. We are now providing enhanced visibility into Availability Zone information, including the physical zone placement of your instances and zone counts. For our App Service Environment v3 customers, we have removed the minimum instance fee for enabling Availability Zones. This means that you now only pay for the Isolated V2 instances you consume. This aligns the pricing model with the multi-tenant App Service offering. For more information as well as guidance on how to use these features, see the docs - Reliability in Azure App Service. Azure Portal support for these new features will be available by mid-June 2025. In the meantime, see the documentation to use these new features with ARM/Bicep or the Azure CLI. Also check out BRK200 breakout session at Microsoft Build 2025 live on May 20th or anytime after via the recording where my team and I will be discussing these new features and many more exciting announcements for Azure App Service. If you’re in the Seattle area and attending Microsoft Build 2025 in person, come meet my team and me at our Expert Meetup Booth. FAQ Q: What are availability zones? Availability zones are physically separate locations within an Azure region, each consisting of one or more data centers with independent power, cooling, and networking. Deploying applications across multiple availability zones ensures high availability and business continuity. Q: How do I enable Availability Zones for my existing App Service plan or App Service Environment v3? There is a new toggle in the Azure portal that will be enabled if your App Service plan or App Service Environment v3 supports Availability Zones. Your deployment must be on the App Service footprint that supports zones in order to have this capability. There is a new property called “MaximumNumberOfZones”, which indicates the number of zones your deployment supports. If this value is greater than one, you are on the footprint that supports zones and can enable Availability Zones as long as you have two or more instances. If this value is equal to one, you need to redeploy. Note that we are continually working to expand the zone footprint across more App Service deployments. Q: Is there an additional charge for Availability Zones? There is no additional charge, you only pay for the instances you use. The only requirement is that you use two or more instances. Q: Can I change the zone redundant property after creating my App Service plan? Yes, the zone redundant property is now mutable, meaning you can toggle it on or off at any time. Q: How can I verify the zone redundancy status of my App Service Plans? We now display the physical zone for each instance, helping you verify zone redundancy status for audits and compliance reviews. Q: How do I use these new features? You can use ARM/Bicep or the Azure CLI at this time. Starting in mid-June, Azure Portal support should be available. The documentation currently shows how to use ARM/Bicep and the Azure CLI to enable these features. The documentation as well as this blog post will be updated once Azure Portal support is available. Q: Are Availability Zones supported on Premium V4? Yes! See the documentation for more details on how to get started with Premium V4 today.4.3KViews8likes12CommentsHow to Secure your pro-code Custom Engine Agent of Microsoft 365 Copilot?
Prerequisite This article assumes that you’ve already gone through a following post. Please make sure to read it before proceeding: Developing a Custom Engine Agent for Microsoft 365 Copilot Chat Using Pro-Code With the article, we have found how to publish a Custom Engine Agent using pro-code approaches such as C#. In this post, I’d like to shift the focus to security, specifically how to protect the endpoint of our custom Microsoft 365 Copilot. Through several architectural explorations, we found an approach that seems to work well. However, I strongly encourage you to review and evaluate it carefully for your production environment. Which Endpoints Can Be Controlled? In the current architecture, there are three key endpoints to consider from a security perspective: Teams Endpoint This is the entry point where users interact with the Custom Engine Agent through Microsoft Teams. Azure Bot Service Endpoint This is the publicly accessible endpoint provided by Azure Bot Service that relays messages between Teams and your bot backend. ASP.NET Core Endpoint In the previous article, we used a local devtunnel for development purposes. In a production environment, however, this would likely be hosted on Azure App Service or others. Each of these endpoints may require different protection strategies, which we’ll explore in the following sections. 1. Controlling the Teams Endpoint When it comes to the Teams endpoint, control ultimately comes down to Teams app management within your Microsoft 365 tenant. Specifically, the manifest file for your custom Teams app (i.e., the Custom Agent) needs to be uploaded in your tenant, and access is governed via the Teams Admin Center. This isn’t about controlling the endpoint, but rather about limiting who can access the app. You can restrict access on a per-user or per-group basis, effectively preventing malicious users inside your organization from using the app. However, you cannot restrict access at the endpoint level, nor could you prevent a malicious external organization from copying the app package. This limitation may pose a concern, especially when thinking about endpoint-level security outside your tenant’s control. 2. Controlling the Azure Bot Service Endpoint The Azure Bot Service endpoint acts as a bridge between the Teams Channel and your pro-code backend. Here, the only available security configuration is to specify the Service Principal that the agent uses. There isn’t much room for granular control here—it’s essentially a relay point managed by Azure Bot Service, and protection depends largely on how you secure the endpoints it connects to. 3. Controlling the ASP.NET Core Endpoint This is where endpoint protection becomes critical. When you configure your bot in Azure Bot Service, you must expose your pro-code endpoint to the public internet. In our earlier article, we used a local devtunnel for development. But in production, you’ll likely use Azure App Service or others, which results in a publicly accessible endpoint. While Microsoft provides documentation on network isolation options for Azure Bot Service, these are currently only supported when using the Direct Line channel - not the Teams channel. This means that when using Teams as the entry point, you cannot isolate the backend endpoint via a private network, making it critical to implement other security measures at the app level (e.g., token validation, IP restrictions, mutual TLS, etc.). https://learn.microsoft.com/en-us/azure/bot-service/dl-network-isolation-concept?view=azure-bot-service-4.0 Let’s Review Other Articles on this Topic There are several valuable resources that describe this topic. Since Microsoft Teams is a SaaS application, the bot endpoint (e.g., https://my-webapp-endpoint.net/api/messages) must be publicly accessible when integrated through the Teams channel. Is it possible to integrate Azure Bot with Teams without public access? How to create Azure Bot Service in a private network? In particular, this article provides an excellent deep dive into the traffic flow between Teams and Azure Bot Service: Azure Bot Service, Microsoft Teams architecture, and message flow In the section titled “Challenge 2: Network isolation vs. Teams connectivity,” the article clearly explains why network-level isolation is fundamentally incompatible with the Teams channel. The article also outlines a practical security approach using Azure Firewall, NSG (Network Security Groups), and JWT token validation at the application level. If you're using the Teams channel, complete network isolation is not feasible—which makes sense, given that Teams itself is a SaaS platform and cannot be brought into your private network. As a result, protecting the backend bot (e.g., the ASP.NET Core endpoint) will require application-level controls, particularly JWT token validation to ensure that only trusted sources can invoke the bot. Let’s now take a closer look at how to implement that in C#. Controlling Endpoints in the ASP.NET Core Application So, what does endpoint control look like at the application level? Let’s return to the ASP.NET Core side of things and take a closer look at the default project structure. If you recall, the Program.cs in the template project contains a specific line worth revisiting. This configuration plays an important role in how the application handles and secures incoming requests. Let’s take a look at that setup. // Register the WeatherForecastAgent builder.Services.AddTransient<WeatherForecastAgent>(); // Add AspNet token validation - ** HERE ** builder.Services.AddBotAspNetAuthentication(builder.Configuration); // Register IStorage. For development, MemoryStorage is suitable. // For production Agents, persisted storage should be used so // that state survives Agent restarts, and operate correctly // in a cluster of Agent instances. builder.Services.AddSingleton<IStorage, MemoryStorage>(); As it turns out, the AddBotAspNetAuthentication method referenced earlier in Program.cs is actually defined in the same project, within a file named AspNetExtensions.cs. This method is where access token validation is implemented and enforced. Let’s take a closer look at a key portion of the AddBotAspNetAuthentication method from AspNetExtensions.cs: public static void AddBotAspNetAuthentication(this IServiceCollection services, IConfiguration configuration, string tokenValidationSectionName = "TokenValidation", ILogger logger = null) { IConfigurationSection tokenValidationSection = configuration.GetSection(tokenValidationSectionName); List<string> validTokenIssuers = tokenValidationSection.GetSection("ValidIssuers").Get<List<string>>(); List<string> audiences = tokenValidationSection.GetSection("Audiences").Get<List<string>>(); if (!tokenValidationSection.Exists()) { logger?.LogError("Missing configuration section '{tokenValidationSectionName}'. This section is required to be present in appsettings.json",tokenValidationSectionName); throw new InvalidOperationException($"Missing configuration section '{tokenValidationSectionName}'. This section is required to be present in appsettings.json"); } // If ValidIssuers is empty, default for ABS Public Cloud if (validTokenIssuers == null || validTokenIssuers.Count == 0) { validTokenIssuers = [ "https://api.botframework.com", "https://sts.windows.net/d6d49420-f39b-4df7-a1dc-d59a935871db/", "https://login.microsoftonline.com/d6d49420-f39b-4df7-a1dc-d59a935871db/v2.0", "https://sts.windows.net/f8cdef31-a31e-4b4a-93e4-5f571e91255a/", "https://login.microsoftonline.com/f8cdef31-a31e-4b4a-93e4-5f571e91255a/v2.0", "https://sts.windows.net/69e9b82d-4842-4902-8d1e-abc5b98a55e8/", "https://login.microsoftonline.com/69e9b82d-4842-4902-8d1e-abc5b98a55e8/v2.0", ]; string tenantId = tokenValidationSection["TenantId"]; if (!string.IsNullOrEmpty(tenantId)) { validTokenIssuers.Add(string.Format(CultureInfo.InvariantCulture, AuthenticationConstants.ValidTokenIssuerUrlTemplateV1, tenantId)); validTokenIssuers.Add(string.Format(CultureInfo.InvariantCulture, AuthenticationConstants.ValidTokenIssuerUrlTemplateV2, tenantId)); } } if (audiences == null || audiences.Count == 0) { throw new ArgumentException($"{tokenValidationSectionName}:Audiences requires at least one value"); } bool isGov = tokenValidationSection.GetValue("IsGov", false); bool azureBotServiceTokenHandling = tokenValidationSection.GetValue("AzureBotServiceTokenHandling", true); // If the `AzureBotServiceOpenIdMetadataUrl` setting is not specified, use the default based on `IsGov`. This is what is used to authenticate ABS tokens. string azureBotServiceOpenIdMetadataUrl = tokenValidationSection["AzureBotServiceOpenIdMetadataUrl"]; if (string.IsNullOrEmpty(azureBotServiceOpenIdMetadataUrl)) { azureBotServiceOpenIdMetadataUrl = isGov ? AuthenticationConstants.GovAzureBotServiceOpenIdMetadataUrl : AuthenticationConstants.PublicAzureBotServiceOpenIdMetadataUrl; } // If the `OpenIdMetadataUrl` setting is not specified, use the default based on `IsGov`. This is what is used to authenticate Entra ID tokens. string openIdMetadataUrl = tokenValidationSection["OpenIdMetadataUrl"]; if (string.IsNullOrEmpty(openIdMetadataUrl)) { openIdMetadataUrl = isGov ? AuthenticationConstants.GovOpenIdMetadataUrl : AuthenticationConstants.PublicOpenIdMetadataUrl; } TimeSpan openIdRefreshInterval = tokenValidationSection.GetValue("OpenIdMetadataRefresh", BaseConfigurationManager.DefaultAutomaticRefreshInterval); _ = services.AddAuthentication(options => { options.DefaultAuthenticateScheme = JwtBearerDefaults.AuthenticationScheme; options.DefaultChallengeScheme = JwtBearerDefaults.AuthenticationScheme; }) .AddJwtBearer(options => { options.SaveToken = true; options.TokenValidationParameters = new TokenValidationParameters { ValidateIssuer = true, ValidateAudience = true, ValidateLifetime = true, ClockSkew = TimeSpan.FromMinutes(5), ValidIssuers = validTokenIssuers, ValidAudiences = audiences, ValidateIssuerSigningKey = true, RequireSignedTokens = true, }; // Using Microsoft.IdentityModel.Validators options.TokenValidationParameters.EnableAadSigningKeyIssuerValidation(); options.Events = new JwtBearerEvents { // Create a ConfigurationManager based on the requestor. This is to handle ABS non-Entra tokens. OnMessageReceived = async context => { string authorizationHeader = context.Request.Headers.Authorization.ToString(); if (string.IsNullOrEmpty(authorizationHeader)) { // Default to AadTokenValidation handling context.Options.TokenValidationParameters.ConfigurationManager ??= options.ConfigurationManager as BaseConfigurationManager; await Task.CompletedTask.ConfigureAwait(false); return; } string[] parts = authorizationHeader?.Split(' '); if (parts.Length != 2 || parts[0] != "Bearer") { // Default to AadTokenValidation handling context.Options.TokenValidationParameters.ConfigurationManager ??= options.ConfigurationManager as BaseConfigurationManager; await Task.CompletedTask.ConfigureAwait(false); return; } JwtSecurityToken token = new(parts[1]); string issuer = token.Claims.FirstOrDefault(claim => claim.Type == AuthenticationConstants.IssuerClaim)?.Value; if (azureBotServiceTokenHandling && AuthenticationConstants.BotFrameworkTokenIssuer.Equals(issuer)) { // Use the Bot Framework authority for this configuration manager context.Options.TokenValidationParameters.ConfigurationManager = _openIdMetadataCache.GetOrAdd(azureBotServiceOpenIdMetadataUrl, key => { return new ConfigurationManager<OpenIdConnectConfiguration>(azureBotServiceOpenIdMetadataUrl, new OpenIdConnectConfigurationRetriever(), new HttpClient()) { AutomaticRefreshInterval = openIdRefreshInterval }; }); } else { context.Options.TokenValidationParameters.ConfigurationManager = _openIdMetadataCache.GetOrAdd(openIdMetadataUrl, key => { return new ConfigurationManager<OpenIdConnectConfiguration>(openIdMetadataUrl, new OpenIdConnectConfigurationRetriever(), new HttpClient()) { AutomaticRefreshInterval = openIdRefreshInterval }; }); } await Task.CompletedTask.ConfigureAwait(false); }, OnTokenValidated = context => { logger?.LogDebug("TOKEN Validated"); return Task.CompletedTask; }, OnForbidden = context => { logger?.LogWarning("Forbidden: {m}", context.Result.ToString()); return Task.CompletedTask; }, OnAuthenticationFailed = context => { logger?.LogWarning("Auth Failed {m}", context.Exception.ToString()); return Task.CompletedTask; } }; }); } From examining the code, we can see that it reads configuration settings from the appsettings.{your-env}.json file and uses them during token validation. In particular, the following line stands out: TokenValidationParameters.ValidAudiences = audiences; This ensures that only tokens issued for the configured audience (i.e., your Azure Bot Service's Service Principal) will be accepted. Any requests carrying tokens with mismatched audiences will be rejected during validation. One critical observation is that if no access token is provided at all, the code effectively lets the request through without enforcing validation. This means that if the Service Principal is misconfigured or lacks proper permissions, and therefore no token is issued with the request, the bot may still continue processing it without rejecting the request. This could potentially create a security loophole, especially if the backend API is publicly accessible. OnMessageReceived = async context => { string authorizationHeader = context.Request.Headers.Authorization.ToString(); if (string.IsNullOrEmpty(authorizationHeader)) { // Default to AadTokenValidation handling context.Options.TokenValidationParameters.ConfigurationManager ??= options.ConfigurationManager as BaseConfigurationManager; await Task.CompletedTask.ConfigureAwait(false); return; } Additional Security Concerns and Improvements Another point worth noting about the current code is that the Custom Engine Agent app can be copied and uploaded to a different Entra ID tenant, and it would still work. (Admittedly, this might be intentional since the architecture assumes providing Custom Engine Agent services to multiple organizations.) The project template and Teams settings raise two key security concerns that we should address: Reject requests when the token is missing - token should not be empty. Block access from unknown or unauthorized Entra ID tenants. To enforce the above, you will need to update the Service Principal configuration accordingly. Specifically, open the Service Principal's API permissions tab and add the following permission: User.Read.All Without this permission, access tokens will not be issued, making token validation impossible. After updating the Service Principal permissions, run your ASP.NET Core app and set a breakpoint around the following code to inspect the contents of the token included in the Authorization header. This will help you verify whether the token is correctly issued and contains the expected claims. string authorizationHeader = context.Request.Headers.Authorization.ToString(); The token is Base64 encoded, so let’s decode it to inspect its contents. I asked Copilot to help us decode the token so we can better understand the claims and data included inside. Let's inspect the token contents. After decoding the token (some parts are redacted for privacy), we can see that: The aud (audience) claim contains the Service Principal’s client ID. The serviceurl claim includes the Entra ID tenant ID. I attempted to configure the authorization settings to include the Tenant ID directly in the access token claims, but was not successful this time. Below is a sample code snippet that implements of the following requirements: Reject requests with an empty or missing token. Deny access from unknown Entra ID tenants. This is the sample code for "1. Reject requests with an empty or missing token". I’ve added comments in the code to clearly indicate what was changed. public static void AddBotAspNetAuthentication(this IServiceCollection services, IConfiguration configuration, string tokenValidationSectionName = "TokenValidation", ILogger logger = null) { IConfigurationSection tokenValidationSection = configuration.GetSection(tokenValidationSectionName); List<string> validTokenIssuers = tokenValidationSection.GetSection("ValidIssuers").Get<List<string>>(); List<string> audiences = tokenValidationSection.GetSection("Audiences").Get<List<string>>(); if (!tokenValidationSection.Exists()) { logger?.LogError("Missing configuration section '{tokenValidationSectionName}'. This section is required to be present in appsettings.json",tokenValidationSectionName); throw new InvalidOperationException($"Missing configuration section '{tokenValidationSectionName}'. This section is required to be present in appsettings.json"); } // If ValidIssuers is empty, default for ABS Public Cloud if (validTokenIssuers == null || validTokenIssuers.Count == 0) { validTokenIssuers = [ "https://api.botframework.com", "https://sts.windows.net/d6d49420-f39b-4df7-a1dc-d59a935871db/", "https://login.microsoftonline.com/d6d49420-f39b-4df7-a1dc-d59a935871db/v2.0", "https://sts.windows.net/f8cdef31-a31e-4b4a-93e4-5f571e91255a/", "https://login.microsoftonline.com/f8cdef31-a31e-4b4a-93e4-5f571e91255a/v2.0", "https://sts.windows.net/69e9b82d-4842-4902-8d1e-abc5b98a55e8/", "https://login.microsoftonline.com/69e9b82d-4842-4902-8d1e-abc5b98a55e8/v2.0", ]; string tenantId = tokenValidationSection["TenantId"]; if (!string.IsNullOrEmpty(tenantId)) { validTokenIssuers.Add(string.Format(CultureInfo.InvariantCulture, AuthenticationConstants.ValidTokenIssuerUrlTemplateV1, tenantId)); validTokenIssuers.Add(string.Format(CultureInfo.InvariantCulture, AuthenticationConstants.ValidTokenIssuerUrlTemplateV2, tenantId)); } } if (audiences == null || audiences.Count == 0) { throw new ArgumentException($"{tokenValidationSectionName}:Audiences requires at least one value"); } bool isGov = tokenValidationSection.GetValue("IsGov", false); bool azureBotServiceTokenHandling = tokenValidationSection.GetValue("AzureBotServiceTokenHandling", true); // If the `AzureBotServiceOpenIdMetadataUrl` setting is not specified, use the default based on `IsGov`. This is what is used to authenticate ABS tokens. string azureBotServiceOpenIdMetadataUrl = tokenValidationSection["AzureBotServiceOpenIdMetadataUrl"]; if (string.IsNullOrEmpty(azureBotServiceOpenIdMetadataUrl)) { azureBotServiceOpenIdMetadataUrl = isGov ? AuthenticationConstants.GovAzureBotServiceOpenIdMetadataUrl : AuthenticationConstants.PublicAzureBotServiceOpenIdMetadataUrl; } // If the `OpenIdMetadataUrl` setting is not specified, use the default based on `IsGov`. This is what is used to authenticate Entra ID tokens. string openIdMetadataUrl = tokenValidationSection["OpenIdMetadataUrl"]; if (string.IsNullOrEmpty(openIdMetadataUrl)) { openIdMetadataUrl = isGov ? AuthenticationConstants.GovOpenIdMetadataUrl : AuthenticationConstants.PublicOpenIdMetadataUrl; } TimeSpan openIdRefreshInterval = tokenValidationSection.GetValue("OpenIdMetadataRefresh", BaseConfigurationManager.DefaultAutomaticRefreshInterval); _ = services.AddAuthentication(options => { options.DefaultAuthenticateScheme = JwtBearerDefaults.AuthenticationScheme; options.DefaultChallengeScheme = JwtBearerDefaults.AuthenticationScheme; }) .AddJwtBearer(options => { options.SaveToken = true; options.TokenValidationParameters = new TokenValidationParameters { ValidateIssuer = true, ValidateAudience = true, // this option enables to validate the audience claim with audiences values ValidateLifetime = true, ClockSkew = TimeSpan.FromMinutes(5), ValidIssuers = validTokenIssuers, ValidAudiences = audiences, ValidateIssuerSigningKey = true, RequireSignedTokens = true, }; // Using Microsoft.IdentityModel.Validators options.TokenValidationParameters.EnableAadSigningKeyIssuerValidation(); options.Events = new JwtBearerEvents { // Create a ConfigurationManager based on the requestor. This is to handle ABS non-Entra tokens. OnMessageReceived = async context => { string authorizationHeader = context.Request.Headers.Authorization.ToString(); if (string.IsNullOrEmpty(authorizationHeader)) { // Default to AadTokenValidation handling // context.Options.TokenValidationParameters.ConfigurationManager ??= options.ConfigurationManager as BaseConfigurationManager; // await Task.CompletedTask.ConfigureAwait(false); // return; // // Fail the request when the token is empty context.Fail("Authorization header is missing."); logger?.LogWarning("Authorization header is missing."); return; } string[] parts = authorizationHeader?.Split(' '); if (parts.Length != 2 || parts[0] != "Bearer") { // Default to AadTokenValidation handling context.Options.TokenValidationParameters.ConfigurationManager ??= options.ConfigurationManager as BaseConfigurationManager; await Task.CompletedTask.ConfigureAwait(false); return; } Next, we should implement about "2. Deny access from unknown Entra ID tenants" We can retrieve the Tenant ID inside the MessageActivityAsync method of Bot/WeatherAgentBot.cs. Let’s extend the logic by referring to the following sample code to capture and utilize the Tenant ID within that method. https://github.com/OfficeDev/microsoft-teams-apps-company-communicator/blob/dcf3b169084d3fff7c1e4c5b68718fb33c3391dd/Source/CompanyCommunicator/Bot/CompanyCommunicatorBotFilterMiddleware.cs#L44 Here is how you can extend the logic to retrieve and use the Tenant ID within the MessageActivityAsync method: using MyM365Agent1.Bot.Agents; using Microsoft.Agents.Builder; using Microsoft.Agents.Builder.App; using Microsoft.Agents.Builder.State; using Microsoft.Agents.Core.Models; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.ChatCompletion; using Microsoft.Extensions.DependencyInjection.Extensions; namespace MyM365Agent1.Bot; public class WeatherAgentBot : AgentApplication { private WeatherForecastAgent _weatherAgent; private Kernel _kernel; private readonly string _tenantId; private readonly ILogger<WeatherAgentBot> _logger; public WeatherAgentBot(AgentApplicationOptions options, Kernel kernel, IConfiguration configuration, ILogger<WeatherAgentBot> logger) : base(options) { _kernel = kernel ?? throw new ArgumentNullException(nameof(kernel)); _logger = logger ?? throw new ArgumentNullException(nameof(logger)); OnConversationUpdate(ConversationUpdateEvents.MembersAdded, WelcomeMessageAsync); OnActivity(ActivityTypes.Message, MessageActivityAsync, rank: RouteRank.Last); // Get TenantId from TokenValidation section var tokenValidationSection = configuration.GetSection("TokenValidation"); _tenantId = tokenValidationSection["TenantId"]; } protected async Task MessageActivityAsync(ITurnContext turnContext, ITurnState turnState, CancellationToken cancellationToken) { // add validation of tenant ID var activity = turnContext.Activity; // Log: Received activity _logger.LogInformation("Received message activity from: {FromId}, AadObjectId:{AadObjectId}, TenantId: {TenantId}, ChannelId: {ChannelId}, ConversationType: {ConversationType}", activity?.From?.Id, activity?.From?.AadObjectId, activity?.Conversation?.TenantId, activity?.ChannelId, activity?.Conversation?.ConversationType); if (activity.ChannelId != "msteams" // Ignore messages not from Teams || activity.Conversation?.ConversationType?.ToLowerInvariant() != "personal" // Ignore messages from team channels or group chats || string.IsNullOrEmpty(activity.From?.AadObjectId) // Ignore if not an AAD user (e.g., bots, guest users) || (!string.IsNullOrEmpty(_tenantId) && !string.Equals(activity.Conversation?.TenantId, _tenantId, StringComparison.OrdinalIgnoreCase))) // Ignore if tenant ID does not match { _logger.LogWarning("Unauthorized serviceUrl detected: {ServiceUrl}. Expected to contain TenantId: {TenantId}", activity?.ServiceUrl, _tenantId); await turnContext.SendActivityAsync("Unauthorized service URL.", cancellationToken: cancellationToken); return; } // Setup local service connection ServiceCollection serviceCollection = [ new ServiceDescriptor(typeof(ITurnState), turnState), new ServiceDescriptor(typeof(ITurnContext), turnContext), new ServiceDescriptor(typeof(Kernel), _kernel), ]; // Start a Streaming Process await turnContext.StreamingResponse.QueueInformativeUpdateAsync("Working on a response for you"); ChatHistory chatHistory = turnState.GetValue("conversation.chatHistory", () => new ChatHistory()); _weatherAgent = new WeatherForecastAgent(_kernel, serviceCollection.BuildServiceProvider()); // Invoke the WeatherForecastAgent to process the message WeatherForecastAgentResponse forecastResponse = await _weatherAgent.InvokeAgentAsync(turnContext.Activity.Text, chatHistory); if (forecastResponse == null) { turnContext.StreamingResponse.QueueTextChunk("Sorry, I couldn't get the weather forecast at the moment."); await turnContext.StreamingResponse.EndStreamAsync(cancellationToken); return; } // Create a response message based on the response content type from the WeatherForecastAgent // Send the response message back to the user. switch (forecastResponse.ContentType) { case WeatherForecastAgentResponseContentType.Text: turnContext.StreamingResponse.QueueTextChunk(forecastResponse.Content); break; case WeatherForecastAgentResponseContentType.AdaptiveCard: turnContext.StreamingResponse.FinalMessage = MessageFactory.Attachment(new Attachment() { ContentType = "application/vnd.microsoft.card.adaptive", Content = forecastResponse.Content, }); break; default: break; } await turnContext.StreamingResponse.EndStreamAsync(cancellationToken); // End the streaming response } protected async Task WelcomeMessageAsync(ITurnContext turnContext, ITurnState turnState, CancellationToken cancellationToken) { foreach (ChannelAccount member in turnContext.Activity.MembersAdded) { if (member.Id != turnContext.Activity.Recipient.Id) { await turnContext.SendActivityAsync(MessageFactory.Text("Hello and Welcome! I'm here to help with all your weather forecast needs!"), cancellationToken); } } } } Since we’re at it, I’ve added various validations as well. I hope this will be helpful as a reference for everyone.478Views6likes0CommentsImportant Changes to App Service Managed Certificates: Is Your Certificate Affected?
Overview As part of an upcoming industry-wide change, DigiCert, the Certificate Authority (CA) for Azure App Service Managed Certificates (ASMC), is required to migrate to a new validation platform to meet multi-perspective issuance corroboration (MPIC) requirements. While most certificates will not be impacted by this change, certain site configurations and setups may prevent certificate issuance or renewal starting July 28, 2025. Update (August 5, 2025) We’ve published a Microsoft Learn documentation titled App Service Managed Certificate (ASMC) changes – July 28, 2025 that contains more in-depth mitigation guidance and a growing FAQ section to support the changes outlined in this blog post. While the blog currently contains the most complete overview, the documentation will soon be updated to reflect all blog content. Going forward, any new information or clarifications will be added to the documentation page, so we recommend bookmarking it for the latest guidance. What Will the Change Look Like? For most customers: No disruption. Certificate issuance and renewals will continue as expected for eligible site configurations. For impacted scenarios: Certificate requests will fail (no certificate issued) starting July 28, 2025, if your site configuration is not supported. Existing certificates will remain valid until their expiration (up to six months after last renewal). Impacted Scenarios You will be affected by this change if any of the following apply to your site configurations: Your site is not publicly accessible: Public accessibility to your app is required. If your app is only accessible privately (e.g., requiring a client certificate for access, disabling public network access, using private endpoints or IP restrictions), you will not be able to create or renew a managed certificate. Other site configurations or setup methods not explicitly listed here that restrict public access, such as firewalls, authentication gateways, or any custom access policies, can also impact eligibility for managed certificate issuance or renewal. Action: Ensure your app is accessible from the public internet. However, if you need to limit access to your app, then you must acquire your own SSL certificate and add it to your site. Your site uses Azure Traffic Manager "nested" or "external" endpoints: Only “Azure Endpoints” on Traffic Manager will be supported for certificate creation and renewal. “Nested endpoints” and “External endpoints” will not be supported. Action: Transition to using "Azure Endpoints". However, if you cannot, then you must obtain a different SSL certificate for your domain and add it to your site. Your site relies on *.trafficmanager.net domain: Certificates for *.trafficmanager.net domains will not be supported for creation or renewal. Action: Add a custom domain to your app and point the custom domain to your *.trafficmanager.net domain. After that, secure the custom domain with a new SSL certificate. If none of the above applies, no further action is required. How to Identify Impacted Resources? To assist with the upcoming changes, you can use Azure Resource Graph (ARG) queries to help identify resources that may be affected under each scenario. Please note that these queries are provided as a starting point and may not capture every configuration. Review your environment for any unique setups or custom configurations. Scenario 1: Sites Not Publicly Accessible This ARG query retrieves a list of sites that either have the public network access property disabled or are configured to use client certificates. It then filters for sites that are using App Service Managed Certificates (ASMC) for their custom hostname SSL bindings. These certificates are the ones that could be affected by the upcoming changes. However, please note that this query does not provide complete coverage, as there may be additional configurations impacting public access to your app that are not included here. Ultimately, this query serves as a helpful guide for users, but a thorough review of your environment is recommended. You can copy this query, paste it into Azure Resource Graph Explorer, and then click "Run query" to view the results for your environment. // ARG Query: Identify App Service sites that commonly restrict public access and use ASMC for custom hostname SSL bindings resources | where type == "microsoft.web/sites" // Extract relevant properties for public access and client certificate settings | extend publicNetworkAccess = tolower(tostring(properties.publicNetworkAccess)), clientCertEnabled = tolower(tostring(properties.clientCertEnabled)) // Filter for sites that either have public network access disabled // or have client certificates enabled (both can restrict public access) | where publicNetworkAccess == "disabled" or clientCertEnabled != "false" // Expand the list of SSL bindings for each site | mv-expand hostNameSslState = properties.hostNameSslStates | extend hostName = tostring(hostNameSslState.name), thumbprint = tostring(hostNameSslState.thumbprint) // Only consider custom domains (exclude default *.azurewebsites.net) and sites with an SSL certificate bound | where tolower(hostName) !endswith "azurewebsites.net" and isnotempty(thumbprint) // Select key site properties for output | project siteName = name, siteId = id, siteResourceGroup = resourceGroup, thumbprint, publicNetworkAccess, clientCertEnabled // Join with certificates to find only those using App Service Managed Certificates (ASMC) // ASMCs are identified by the presence of the "canonicalName" property | join kind=inner ( resources | where type == "microsoft.web/certificates" | extend certThumbprint = tostring(properties.thumbprint), canonicalName = tostring(properties.canonicalName) // Only ASMC uses the "canonicalName" property | where isnotempty(canonicalName) | project certName = name, certId = id, certResourceGroup = tostring(properties.resourceGroup), certExpiration = properties.expirationDate, certThumbprint, canonicalName ) on $left.thumbprint == $right.certThumbprint // Final output: sites with restricted public access and using ASMC for custom hostname SSL bindings | project siteName, siteId, siteResourceGroup, publicNetworkAccess, clientCertEnabled, thumbprint, certName, certId, certResourceGroup, certExpiration, canonicalName Scenario 2: Traffic Manager Endpoint Types For this scenario, please manually review your Traffic Manager profile configurations to ensure only “Azure Endpoints” are in use. We recommend inspecting your Traffic Manager profiles directly in the Azure portal or using relevant APIs to confirm your setup and ensure compliance with the new requirements. Scenario 3: Certificates Issued to *.trafficmanager.net Domains This ARG query helps you identify App Service Managed Certificates (ASMC) that were issued to *.trafficmanager.net domains. In addition, it also checks whether any web apps are currently using those certificates for custom domain SSL bindings. You can copy this query, paste it into Azure Resource Graph Explorer, and then click "Run query" to view the results for your environment. // ARG Query: Identify App Service Managed Certificates (ASMC) issued to *.trafficmanager.net domains // Also checks if any web apps are currently using those certificates for custom domain SSL bindings resources | where type == "microsoft.web/certificates" // Extract the certificate thumbprint and canonicalName (ASMCs have a canonicalName property) | extend certThumbprint = tostring(properties.thumbprint), canonicalName = tostring(properties.canonicalName) // Only ASMC uses the "canonicalName" property // Filter for certificates issued to *.trafficmanager.net domains | where canonicalName endswith "trafficmanager.net" // Select key certificate properties for output | project certName = name, certId = id, certResourceGroup = tostring(properties.resourceGroup), certExpiration = properties.expirationDate, certThumbprint, canonicalName // Join with web apps to see if any are using these certificates for SSL bindings | join kind=leftouter ( resources | where type == "microsoft.web/sites" // Expand the list of SSL bindings for each site | mv-expand hostNameSslState = properties.hostNameSslStates | extend hostName = tostring(hostNameSslState.name), thumbprint = tostring(hostNameSslState.thumbprint) // Only consider bindings for *.trafficmanager.net custom domains with a certificate bound | where tolower(hostName) endswith "trafficmanager.net" and isnotempty(thumbprint) // Select key site properties for output | project siteName = name, siteId = id, siteResourceGroup = resourceGroup, thumbprint ) on $left.certThumbprint == $right.thumbprint // Final output: ASMCs for *.trafficmanager.net domains and any web apps using them | project certName, certId, certResourceGroup, certExpiration, canonicalName, siteName, siteId, siteResourceGroup Ongoing Updates We will continue to update this post with any new queries or important changes as they become available. Be sure to check back for the latest information. Note on Comments We hope this information helps you navigate the upcoming changes. To keep this post clear and focused, comments are closed. If you have questions, need help, or want to share tips or alternative detection methods, please visit our official support channels or the Microsoft Q&A, where our team and the community can assist you.22KViews1like1CommentSuperfast using Web App and Managed Identity to invoke Function App triggers
TOC Introduction Setup References 1. Introduction Many enterprises prefer not to use App Keys to invoke Function App triggers, as they are concerned that these fixed strings might be exposed. This method allows you to invoke Function App triggers using Managed Identity for enhanced security. I will provide examples in both Bash and Node.js. 2. Setup 1. Create a Linux Python 3.11 Function App 1.1. Configure Authentication to block unauthenticated callers while allowing the Web App’s Managed Identity to authenticate. Identity Provider Microsoft Choose a tenant for your application and it's users Workforce Configuration App registration type Create Name [automatically generated] Client Secret expiration [fit-in your business purpose] Supported Account Type Any Microsoft Entra Directory - Multi-Tenant Client application requirement Allow requests from any application Identity requirement Allow requests from any identity Tenant requirement Use default restrictions based on issuer Token store [checked] 1.2. Create an anonymous trigger. Since your app is already protected by App Registration, additional Function App-level protection is unnecessary; otherwise, you will need a Function Key to trigger it. 1.3. Once the Function App is configured, try accessing the endpoint directly—you should receive a 401 Unauthorized error, confirming that triggers cannot be accessed without proper Managed Identity authorization. 1.4. After making these changes, wait 10 minutes for the settings to take effect. 2. Create a Linux Node.js 20 Web App and Obtain an Access Token and Invoke the Function App Trigger Using Web App (Bash Example) 2.1. Enable System Assigned Managed Identity in the Web App settings. 2.2. Open Kudu SSH Console for the Web App. 2.3. Run the following commands, making the necessary modifications: subscriptionsID → Replace with your Subscription ID. resourceGroupsID → Replace with your Resource Group ID. application_id_uri → Replace with the Application ID URI from your Function App’s App Registration. https://az-9640-faapp.azurewebsites.net/api/test_trigger → Replace with the corresponding Function App trigger URL. # Please setup the target resource to yours subscriptionsID="01d39075-XXXX-XXXX-XXXX-XXXXXXXXXXXX" resourceGroupsID="XXXX" # Variable Setting (No need to change) identityEndpoint="$IDENTITY_ENDPOINT" identityHeader="$IDENTITY_HEADER" application_id_uri="api://9c0012ad-XXXX-XXXX-XXXX-XXXXXXXXXXXX" # Install necessary tool apt install -y jq # Get Access Token tokenUri="${identityEndpoint}?resource=${application_id_uri}&api-version=2019-08-01" accessToken=$(curl -s -H "Metadata: true" -H "X-IDENTITY-HEADER: $identityHeader" "$tokenUri" | jq -r '.access_token') echo "Access Token: $accessToken" # Run Trigger response=$(curl -s -o response.json -w "%{http_code}" -X GET "https://az-9640-myfa.azurewebsites.net/api/my_test_trigger" -H "Authorization: Bearer $accessToken") echo "HTTP Status Code: $response" echo "Response Body:" cat response.json 2.4. If everything is set up correctly, you should see a successful invocation result. 3. Invoke the Function App Trigger Using Web App (nodejs Example) I have also provide my example, which you can modify accordingly and save it to /home/site/wwwroot/callFunctionApp.js and run it cd /home/site/wwwroot/ vi callFunctionApp.js npm init -y npm install azure/identity axios node callFunctionApp.js // callFunctionApp.js const { DefaultAzureCredential } = require("@azure/identity"); const axios = require("axios"); async function callFunctionApp() { try { const applicationIdUri = "api://9c0012ad-XXXX-XXXX-XXXX-XXXXXXXXXXXX"; // Change here const credential = new DefaultAzureCredential(); console.log("Requesting token..."); const tokenResponse = await credential.getToken(applicationIdUri); if (!tokenResponse || !tokenResponse.token) { throw new Error("Failed to acquire access token"); } const accessToken = tokenResponse.token; console.log("Token acquired:", accessToken); const apiUrl = "https://az-9640-myfa.azurewebsites.net/api/my_test_trigger"; // Change here console.log("Calling the API now..."); const response = await axios.get(apiUrl, { headers: { Authorization: `Bearer ${accessToken}`, }, }); console.log("HTTP Status Code:", response.status); console.log("Response Body:", response.data); } catch (error) { console.error("Failed to call the function", error.response ? error.response.data : error.message); } } callFunctionApp(); Below is my execution result: 3. References Tutorial: Managed Identity to Invoke Azure Functions | Microsoft Learn How to Invoke Azure Function App with Managed Identity | by Krizzia 🤖 | Medium Configure Microsoft Entra authentication - Azure App Service | Microsoft Learn896Views1like2CommentsValidating Change Requests with Kubernetes Admission Controllers
Promoting an application or infrastructure change into production often comes with a requirement to follow a change control process. This ensures that changes to production are properly reviewed and that they adhere to required approvals, change windows and QA process. Often this change request (CR) process will be conducted using a system for recording and auditing the change request and the outcome. When deploying a release, there will often be places in the process to go through this change control workflow. This may be as part of a release pipeline, it may be managed in a pull request or it may be a manual process. Ultimately, by the time the actual changes are made to production infrastructure or applications, they should already be approved. This relies on the appropriate controls and restrictions being in place to make sure this happens. When it comes to the point of deploying resources into production Kubernetes clusters, they should have already been through a CR process. However, what if you wanted a way to validate that this is the case, and block anything from being deployed that does not have an approved CR, providing a backstop to ensure that no unapproved resources get deployed? Let's take a look at how we can use an Admission Controller to do this. Admission Controllers A Kubernetes Admission Controller is a mechanism to provide a checkpoint during a deployment that validates resources and applies rules and policies before this resource is accepted into the cluster. Any request to create, update or delete (CRUD) a resource is first run through any applicable admission controllers to check if it violates any of the required rules. Only if all admission controllers allow the request is it then processed. Kubernetes includes some built-in admission controllers, but you can also create your own. Admission controllers are essentially webhooks that are registered with the Kubernetes API server. When a CRUD request is processed by the API server, it calls any of these webhooks that are registered, and processes the response. When creating your own Admission controller, you would usually implement the webhook as a pod running in the cluster. There are three types of Admission Controller webhooks: MutatingAdmissionWebhook: Can modify the incoming object before it is persisted (e.g., injecting sidecars). ValidatingAdmissionWebhook: Can only approve or reject the request based on validation logic. ValidatingAdmissionPolicy: Validation logic is embedded in the API, rather than requiring a separate web service For our scenario we are going to look at using a ValidatingAdmissionWebhook, as we only want to approve or reject a request based on its change request status. Sample Code In this article, we are not going to go line by line through the code for this admission controller, however you can see an example implementation of this in this repo. In this example, we do not build out the full web service for validating change requests themselves. We have some pre-defined CR IDs with pre-configured statuses returned by the application. In a real world implementation your web service would call out to your change management solution to get the current status of the change request. This does not impact how you would build the Admission Controller, just the business logic inside your controller. Components Our Admission Controller consists of several components: Application Our actual admission controller application, which runs a HTTP service that receives the request from the API Server calling the webhook, processes it and applies business logic, and returns a response. In our example this service has been written in GO, but you can use whatever language you like. Your service must meet the API contract defined for the admission webhook. Our application does the following: Reads the incoming change body YAML and extracts the Change ID from the change.company.com/id annotation that should be applied to the resource. We also support the argocd.argoproj.io/change-id and deployment.company.com/change-id annotations. func extractChangeID(req *admissionv1.AdmissionRequest) string { // Try to extract change ID from object annotations obj := req.Object.Raw var objMap map[string]interface{} if err := json.Unmarshal(obj, &objMap); err != nil { return "" } if metadata, ok := objMap["metadata"].(map[string]interface{}); ok { if annotations, ok := metadata["annotations"].(map[string]interface{}); ok { // Look for change ID in various annotation formats if changeID, ok := annotations["change.company.com/id"].(string); ok { return changeID } if changeID, ok := annotations["argocd.argoproj.io/change-id"].(string); ok { return changeID } if changeID, ok := annotations["deployment.company.com/change-id"].(string); ok { return changeID } } } return "" } If it does not find the required annotation, it immediately fails the validation, as no CR is present. if changeID == "" { // Reject resources without change ID annotation klog.Infof("No change ID found, rejecting request") ac.respond(w, &admissionReview, false, "Change ID annotation is required") return } If the CR is present, it validates it. In our demo application this is checked against a hard-coded list of CRs, but in the real world, this is where you would make a call out to your external change management solution to get the CR with that ID. There are 3 possible outcomes here: The CR ID does not match an ID in our system, the validation fails The CR does match an ID in our system, but this CR is not approved, the validation fails The CR does match an ID in our system and this CR has been approved, the validation passes and the resources are created. changeRecord, err := ac.changeService.ValidateChange(changeID) if err != nil { klog.Errorf("Change validation failed: %v", err) ac.respond(w, &admissionReview, false, fmt.Sprintf("Change validation failed: %v", err)) return } if !changeRecord.Approved { klog.Infof("Change %s is not approved (status: %s)", changeID, changeRecord.Status) ac.respond(w, &admissionReview, false, fmt.Sprintf("Change %s is not approved (status: %s)", changeID, changeRecord.Status)) return } klog.Infof("Change %s is approved, allowing deployment", changeID) ac.respond(w, &admissionReview, true, fmt.Sprintf("Change %s approved by %s", changeID, changeRecord.Requester)) Container To run our Admission Controller inside the AKS cluster we need to create a Docker container that runs our application. In the sample code you will find a Docker file used to build this container. We then push the container to a Docker registry, so we can consume the image when we run the webhook service. Kubernetes Resources To run our Docker container and setup a URL that the API server can call we will deploy: A Kubernetes Deployment A Kubernetes Service A set of RBAC roles and bindings to grant access to the Admission Controller Finally, we will deploy the actual ValidatingAdmissionWebhook resource itself. This resource tells the API servers: Where to call the webhook Which operations should require calling the webhook - in our demo application we look at create and delete operations. If you wanted to validate delete operations had a CR, you could also add that Which resource types need to be validated - in our demo we are looking at Deployments, Services and Configmaps, but you could make this as wide or narrow as you require Which namespaces to validate - we added a condition that only applies this validation to namespaces that have a label of changeValidation set to enabled, this way we can control where this is applied and avoid applying it to things like system namespaces. This is very important to ensure you don't break your core Kubernetes infrastructure. This also allows for differentiation between development and production namespaces, where you likely would not want to require Change Requests in development. Finally, we define what happens when the validation fails. There are two options: fail which blocks the resource creation ignore which ignores the failure and allows the resource to be created apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionWebhook metadata: name: change-validation-webhook spec: clientConfig: service: name: admission-controller namespace: admission-controller path: "/admit" rules: - operations: ["CREATE", "UPDATE"] apiGroups: ["apps"] apiVersions: ["v1"] resources: ["deployments"] - operations: ["CREATE", "UPDATE"] apiGroups: [""] apiVersions: ["v1"] resources: ["services", "configmaps"] namespaceSelector: matchLabels: change-validation: "enabled" admissionReviewVersions: ["v1", "v1beta1"] sideEffects: None failurePolicy: Fail Admission Controller In Action Now that we have our admission controller setup, let's attempt to make a change to a resource. Using a Kubernetes Deployment resource, we will attempt to change the number of replicas from three to two. For this resource, the change.company.com/id annotation is set to CHG-2025-000 which is a change request that doesn't exist in our change management system. apiVersion: apps/v1 kind: Deployment metadata: name: demo-app namespace: demo annotations: change.company.com/id: "CHG-2025-000" labels: app: demo-app environment: development spec: replicas: 2 selector: matchLabels: app: demo-app Once we attempt to deploy this, we will quickly see that the the request to update the resource is denied: one or more objects failed to apply, reason: error when patching "/dev/shm/1236013741": admission webhook "change-validation.company.com" denied the request: Change validation failed: change record not found,admission webhook "change-validation.company.com" denied the request: Change validation failed: change record not found. Similarly, if we change the annotation to CHG-2025-999 which is a change request that does exist, but has not been approved, we again see that the request is denied, but this time the error is clear that it is not approved: one or more objects failed to apply, reason: error when patching "/dev/shm/28290353": admission webhook "change-validation.company.com" denied the request: Change CHG-2025-999 is not approved (status: pending),admission webhook "change-validation.company.com" denied the request: Change validation failed: change record not found. Finally, we update the annotation to CHG-2025-002, which has been approved. This time our deployment update succeeds and the number of replicas has been reduced to two. Next Steps What we have created so far works as a Proof of Concept to confirm that using an Admission Controller for this job will work. To move this into production use, we'd need to take a few more steps: Update our web API to call out to our external change management solution and retrieve real change requests Implement proper security for the Admission Controller with SSL certificates and network restrictions inside the cluster Implement high availability with multiple replicas to ensure the service is always able to respond to requests Implement monitoring and log collection for our service to ensure we are aware of any issues Automate the build and release of this solution, including implementing it's own set of change controls! Conclusions Controlling updates into production through a change control process is vital for a stable, secure and audited production environments. Ideally these CR processes will happen early in the release pipeline in a clear, automated process that avoids getting to the point where anyone tries to deploy unapproved changes into production. However, if you want to ensure that this cannot happen, and put some safeguards to ensure that unapproved changes are always blocked, then the use of Admission Controllers is one way to do this. Creating a custom Admission Controller is relatively straightforward and it allows you to integrate your business processes into the decision on whether a resource can be deployed or not. A change control Admission Controller should not be your only change control process, but it can form part of your layers of control and audit. Further Reading Sample Code Admission Control in Kubernetes Manage Change in the Cloud Adoption Framework335Views0likes0CommentsThroughput Testing at Scale for Azure Functions
Introduction Ensuring reliable, high-performance serverless applications is central to our work on Azure Functions. With new plans like Flex Consumption expanding the platform’s capabilities, it's critical to continuously validate that our infrastructure can scale—reliably and efficiently—under real-world load. To meet that need, we built PerfBench (Performance Benchmarker), a comprehensive benchmarking system designed to measure, monitor, and maintain our performance baselines—catching regressions before they impact customers. This infrastructure now runs close to 5,000 test executions every month, spanning multiple SKUs, regions, runtimes, and workloads—with Flex Consumption accounting for more than half of the total volume. This scale of testing helps us not only identify regressions early, but also understand system behavior over time across an increasingly diverse set of scenarios. of all Python Function apps across regions (SKU: Flex Consumption, Instance Size: 2048 – 1000 VUs over 5 mins, HTML Parsing test) Motivation: Why We Built PerfBench The Need for Scale Azure Functions supports a range of triggers, from HTTP requests to event-driven flows like Service Bus or Storage Queue messages. With an ever-growing set of runtimes (e.g., .NET, Node.js, Python, Java, PowerShell) and versions (like Python 3.11 or .NET 8.0), multiple SKUs and regions, the possible test combinations explode quickly. Manual testing or single-scenario benchmarks no longer cut it. The current scope of coverage tests. Plan PricingTier DistinctTestName FlexConsumption FLEX2048 110 FlexConsumption FLEX512 20 Consumption CNS 36 App Service Plan P1V3 32 Functions Premium EP1 46 Table 1: Different test combinations per plan based on Stack, Pricing Tier, Scenario, etc. This doesn’t include the ServiceBus tests. The Flex Consumption Plan There have been many iterations of this infrastructure within the team, and we’ve been continuously monitoring the Functions performance for more than 4 years now - with more than a million runs till now. But with the introduction of the Flex Consumption plan (Preview at the time of building PerfBench), we had to redesign the testing from ground up, as Flex Consumption unlocks new scaling behaviors and needed thorough testing—millions of messages or tens of thousands of requests per second—to ensure confidence in performance goals and regressions prevention. Consumption, Instance Size: 2048) PerfBench: High-Level Architecture Overview PerfBench is composed of several key pieces: Resource Creator – Uses meta files and Bicep templates to deploy receiver function apps (test targets) at scale. Test Infra Generator – Deploys and configures the system that actually does the load generation (e.g., SBLoadGen function app, Scheduler function app, ALT webhook function). Test Infra – The “brain” of testing, including the Scheduler, Azure Load Testing integration, and SBLoadGen. Receiver Function Apps – Deployed once per combination of runtime, version, region, OS, SKU, and scenario. Data Aggregation & Dashboards – Gathers test metrics from Azure Load Testing (ALT) or SBLoadGen, stores them in Azure Data Explorer (ADX), and displays trends in ADX dashboards. Below is a simplified architecture diagram illustrating these components: Components Resource Creator The resource creator uses meta files and Jinja templates to generate Bicep templates for creating resources. Meta Files: We define test scenarios in simple text-based files (e.g., os.txt, runtime_version.txt, sku.txt, scenario.txt). Each file lists possible values (like python|3.11 or dotnet|8.0) and short codes for resource naming. Template Generation: A script reads these meta files and uses them to produce Bicep templates—one template per valid combination—deploying receiver function apps into dedicated resource groups. Filters: Regex-like patterns in a filter.txt file exclude unwanted combos, keeping the matrix manageable. CI/CD Flow: Whenever we add a new runtime or region, a pull request updates the relevant meta file. Once merged, our pipeline regenerates Bicep and redeploys resources (these are idempotent updates). Test Infra Generator Deploys and configures the Scheduler Function App, SBLoadGen Durable Functions app, and the ALT webhook function. Similar CI/CD approach—merging changes triggers the creation (or update) of these infrastructure components. Test Infra: Load Generation, Scheduling, and Reporting Scheduler The conductor of the whole operation that runs every 5 minutes to load test configurations ( test_configs.json) from Blob Storage. The configuration includes details on what tests to run, at what time (e.g., “run at 13:45 daily”), and references to either ALT for HTTP or SBLoadGen for non-HTTP tests - to schedule them using different systems. Some tests run multiple times daily, others once a day; a scheduled downtime is built in for maintenance. HTTP Load Generator - Azure Load Testing (ALT) We utilize Azure Functions to trigger Azure Load Tests (ALT) for HTTP-based scenarios. ALT is a production-grade load generator tool that provides an easy to configure way to send load to different server endpoints using JMeter and Locust. We worked closely with the ALT team to optimize the JMeter scripts for different scenarios and it recently completed second year. We created an abstraction on top of ALT to create a webhook-approach of starting tests as well as get notified when tests finish, and this was done using a custom function app that does the following: Initiate a test run using a predefined JMX file. Continuously poll until the test execution is complete. Retrieve the test results and transform them into the required format. Transmit the formatted results to the data aggregation system. Sample ALT Test Run: 8.8 million requests in under 6 minutes, with a 90th percentile response time of 80ms and zero errors. The system maintained a throughput of 28K+ RPS. Some more details that we did within ALT - 25 Runtime Controllers manage the test logic and concurrency. 40 Engines handle actual load execution, distributing test plans. 1,000 Clients total for 5-minute runs to measure throughput, error rates, and latency. Test Types: HelloWorld (GET request, to understand baseline of the system). HtmlParser (POST request sending HTML for parsing to simulate moderate CPU usage). Service Bus Load Generator - SBLoadGen (Durable Functions) For event-driven scenarios (e.g., Service Bus–based triggers), we built SBLoadGen. It’s a Durable Function that uses the fan-out pattern to distribute work across multiple workers—each responsible for sending a portion of the total load. In a typical run, we aim to generate around one million messages in under a minute to stress-test the system. We intentionally avoid a fan-in step—once messages are in-flight, the system defers to the receiver function apps to process and emit relevant telemetry. Highlights: Generates ~1 million messages in under a minute. Durable Function apps are deployed regionally and are triggered via webhook. Implemented as a Python Function App using Model V2. Note: This would be open sourced in the coming days. Receiver Function Apps (Test apps) These are the actual apps receiving all the load generated. They are deployed with different combinations and updated rarely. Each valid combination (region + OS + runtime + SKU + scenario) gets its own function app, receiving load from ALT or SBLoadGen. HTTP Scenarios: HelloWorld: No-op test to measure overhead of the system and baseline. HTML Parser: POST with an HTML document for parsing (Simulating small CPU load). Non-HTTP (Service Bus) Scenario: CSV-to-JSON plus blob storage operations, blending compute and I/O overhead. Collected Metrics: RPS: Requests per second (RPS), success/error rates, latency distributions for HTTP workloads. MPPS: Messages processed per second (MPPS), success/error rates for non-HTTP (e.g. Service Bus) workloads. Data Aggregation & Dashboards Capturing results at scale is just as important as generating load. PerfBenchV2 uses a modular data pipeline to reliably ingest and visualize metrics from both HTTP and Service Bus–based tests. All test results flow through Event Hubs, which act as an intermediary between the test infrastructure and our analytics platform. The webhook function (used with ALT) and the SBLoadGen app both emit structured logs that are routed through Event Hub streams and ingested into dedicated Azure Data Explorer (ADX) tables. We use three main tables in ADX: HTTPTestResults for test runs executed via Azure Load Testing. SBLoadGenRuns for recording message counts and timing data from Service Bus scenarios. SchedulerRuns to log when and how each test was initiated. On top of this telemetry, we’ve built custom ADX dashboards that allow us to monitor trends in latency, throughput, and error rates over time. These dashboards provide clear, actionable views into system behavior across dozens of runtimes, regions, and SKUs. Because our focus is on long-term trend analysis, rather than real-time anomaly detection, this batch-oriented approach works well and reduces operational complexity. CI/CD Pipeline Integration Continuous Updates: Once a new language version or scenario is added to runtime_version.txt or scenario.txt meta files, the pipeline regenerates Bicep and deploys new receiver apps. The Test Infra Generator also updates or redeploys the needed function apps (Scheduler, SBLoadGen, or ALT webhook) whenever logic changes. Release Confidence: We run throughput tests on these new apps early and often, catching any performance regressions before shipping to customers. Challenges & Lessons Learned Designing and running this infrastructure hasn't been easy and we've learned a lot of valuable lessons on the way. Here are few Exploding Matrix - Handling every runtime, OS, SKU, region, scenario can lead to thousands of permutations. Meta files and a robust filter system help keep this under control, but it remains an ongoing effort. Cloud Transience - With ephemeral infrastructure, sometimes tests fail due to network hiccups or short-lived capacity constraints. We built in retries and redundancy to mitigate transient failures. Early Adoption - PerfBench was among the first heavy “customers” of the new Flex Consumption plan. At times, we had to wait for Bicep features or platform fixes—but it gave us great insight into the plan’s real-world performance. Maintenance & Cleanup - When certain stacks or SKUs near end-of-life, we have to decommission their resources—this also means regular grooming of meta files and filter rules. Success Stories Proactive Regression Detection: PerfBench surfaced critical performance regressions early—often before they could impact customers. These insights enabled timely fixes and gave us confidence to move forward with the General Availability of Flex Consumption. Production-Level Confidence: By continuously running tests across live production regions, PerfBench provided a realistic view of system behavior under load. This allowed the team to fine-tune performance, eliminate bottlenecks, and achieve improvements measured in single-digit milliseconds. Influencing Product Evolution: As one of the first large-scale internal adopters of the Flex Consumption plan, PerfBench served as a rigorous validation tool. The feedback it generated played a direct role in shaping feature priorities and improving platform reliability—well before broader customer adoption. Future Directions Open sourcing: We are in the process of open sourcing all the relevant parts of PerfBench - SBLoadGen, BicepTemplates generator, etc. Production Synthetic Validation and Alerting: Adapting PerfBench’s resource generation approach for ongoing synthetic tests in production, ensuring real environments consistently meet performance SLOs. This will also open up alerting and monitoring scenarios across production fleet. Expanding Trigger Coverage and Variations: Exploring additional triggers like Storage queues or Event Hub triggers to broaden test coverage. Testing different settings within the same scenario (e.g., larger payloads, concurrency changes). Conclusion PerfBench underscores our commitment to high-performance Azure Functions. By automating test app creation (via meta files and Bicep), orchestrating load (via ALT and SBLoadGen), and collecting data in ADX, we maintain a continuous pulse on throughput. This approach has already proven invaluable for Flex Consumption, and we’re excited to expand scenarios and triggers in the future. For more details on Flex Consumption and other hosting plans, check out the Azure Functions Documentation. We hope the insights shared here spark ideas for your own large-scale performance testing needs — whether on Azure Functions or any other distributed cloud services. Acknowledgements We’d like to acknowledge the entire Functions Platform and Tooling teams for their foundational work in enabling this testing infrastructure. Special thanks to the Azure Load Testing (ALT) team for their continued support and collaboration. And finally, sincere appreciation to our leadership for making performance a first-class engineering priority across the stack. Further Reading Azure Functions Azure Functions Flex Consumption Plan Azure Durable Funtions Azure Functions Python Developer Reference Guide Azure Functions Performance Optimizer Example case study: Github and Azure Functions Azure Load Testing Overview Azure Data Explorer Dashboards If you have any questions or want to share your own performance testing experiences, feel free to reach out in the comments!957Views0likes0Comments