best practices
167 TopicsLet's move away from API keys!
35% of all exposed API keys are still active That's a pretty scary statement, that means services are being misused most likely, data is actively being stolen and so on. Let's try to understand why the concept of API keys is a problematic one and how we can move away from it.4.4KViews2likes1CommentAnnouncing the General Availability of New Availability Zone Features for Azure App Service
What are Availability Zones? Availability Zones, or zone redundancy, refers to the deployment of applications across multiple availability zones within an Azure region. Each availability zone consists of one or more data centers with independent power, cooling, and networking. By leveraging zone redundancy, you can protect your applications and data from data center failures, ensuring uninterrupted service. Key Updates The minimum instance requirement for enabling Availability Zones has been reduced from three instances to two, while still maintaining a 99.99% SLA. Many existing App Service plans with two or more instances will automatically support Availability Zones without additional setup. The zone redundant setting for App Service plans and App Service Environment v3 is now mutable throughout the life of the resources. Enhanced visibility into Availability Zone information, including physical zone placement and zone counts, is now provided. For App Service Environment v3, the minimum instance fee for enabling Availability Zones has been removed, aligning the pricing model with the multi-tenant App Service offering. The minimum instance requirement for enabling Availability Zones has been reduced from three instances to two. You can now enjoy the benefits of Availability Zones with just two instances since we continue to uphold a 99.99% SLA even with the two-instance configuration. Many existing App Service plans with two or more instances will automatically support Availability Zones without necessitating additional setup. Over the past few years, efforts have been made to ensure that the App Service footprint supports Availability Zones wherever possible, and we’ve made significant gains in doing so. Therefore, many existing customers can enable Availability Zones on their current deployments without needing to redeploy. Along with supporting 2-instance Availability Zone configuration, we have enabled Availability Zones on the App Service footprint in regions where only two zones may be available. Previously, enabling Availability Zones required a region to have three zones with sufficient capacity. To account for the growing demand, we now support Availability Zone deployments in regions with just two zones. This allows us to provide you with Availability Zone features across more regions. And with that, we are upholding the 99.99% SLA even with the 2-zone configuration. Additionally, we are pleased to announce that the zone redundant setting (zoneRedundant property) for App Service plans and App Service Environment v3 is now mutable throughout the life of these resources. This enhancement allows customers on Premium V2, Premium V3, or Isolated V2 plans to toggle zone redundancy on or off as required. With this capability, you can reduce costs and scale to a single instance when multiple instances are not necessary. Conversely, you can scale out and enable zone redundancy at any time to meet your requirements. This ability has been requested for a while now and we are excited to finally make it available. For App Service Environment v3 users, this also means that your individual App Service plan zone redundancy status is now independent of other plans in your App Service Environment. This means that you can have a mix of zone redundant and non-zone redundant plans in an App Service Environment, something that was previously not supported. In addition to these new features, we also have a couple of other exciting things to share. We are now providing enhanced visibility into Availability Zone information, including the physical zone placement of your instances and zone counts. For our App Service Environment v3 customers, we have removed the minimum instance fee for enabling Availability Zones. This means that you now only pay for the Isolated V2 instances you consume. This aligns the pricing model with the multi-tenant App Service offering. For more information as well as guidance on how to use these features, see the docs - Reliability in Azure App Service. Azure Portal support for these new features will be available by mid-June 2025. In the meantime, see the documentation to use these new features with ARM/Bicep or the Azure CLI. Also check out BRK200 breakout session at Microsoft Build 2025 live on May 20th or anytime after via the recording where my team and I will be discussing these new features and many more exciting announcements for Azure App Service. If you’re in the Seattle area and attending Microsoft Build 2025 in person, come meet my team and me at our Expert Meetup Booth. FAQ Q: What are availability zones? Availability zones are physically separate locations within an Azure region, each consisting of one or more data centers with independent power, cooling, and networking. Deploying applications across multiple availability zones ensures high availability and business continuity. Q: How do I enable Availability Zones for my existing App Service plan or App Service Environment v3? There is a new toggle in the Azure portal that will be enabled if your App Service plan or App Service Environment v3 supports Availability Zones. Your deployment must be on the App Service footprint that supports zones in order to have this capability. There is a new property called “MaximumNumberOfZones”, which indicates the number of zones your deployment supports. If this value is greater than one, you are on the footprint that supports zones and can enable Availability Zones as long as you have two or more instances. If this value is equal to one, you need to redeploy. Note that we are continually working to expand the zone footprint across more App Service deployments. Q: Is there an additional charge for Availability Zones? There is no additional charge, you only pay for the instances you use. The only requirement is that you use two or more instances. Q: Can I change the zone redundant property after creating my App Service plan? Yes, the zone redundant property is now mutable, meaning you can toggle it on or off at any time. Q: How can I verify the zone redundancy status of my App Service Plans? We now display the physical zone for each instance, helping you verify zone redundancy status for audits and compliance reviews. Q: How do I use these new features? You can use ARM/Bicep or the Azure CLI at this time. Starting in mid-June, Azure Portal support should be available. The documentation currently shows how to use ARM/Bicep and the Azure CLI to enable these features. The documentation as well as this blog post will be updated once Azure Portal support is available. Q: Are Availability Zones supported on Premium V4? Yes! See the documentation for more details on how to get started with Premium V4 today.4.1KViews8likes12CommentsHow to Secure your pro-code Custom Engine Agent of Microsoft 365 Copilot?
Prerequisite This article assumes that you’ve already gone through a following post. Please make sure to read it before proceeding: Developing a Custom Engine Agent for Microsoft 365 Copilot Chat Using Pro-Code With the article, we have found how to publish a Custom Engine Agent using pro-code approaches such as C#. In this post, I’d like to shift the focus to security, specifically how to protect the endpoint of our custom Microsoft 365 Copilot. Through several architectural explorations, we found an approach that seems to work well. However, I strongly encourage you to review and evaluate it carefully for your production environment. Which Endpoints Can Be Controlled? In the current architecture, there are three key endpoints to consider from a security perspective: Teams Endpoint This is the entry point where users interact with the Custom Engine Agent through Microsoft Teams. Azure Bot Service Endpoint This is the publicly accessible endpoint provided by Azure Bot Service that relays messages between Teams and your bot backend. ASP.NET Core Endpoint In the previous article, we used a local devtunnel for development purposes. In a production environment, however, this would likely be hosted on Azure App Service or others. Each of these endpoints may require different protection strategies, which we’ll explore in the following sections. 1. Controlling the Teams Endpoint When it comes to the Teams endpoint, control ultimately comes down to Teams app management within your Microsoft 365 tenant. Specifically, the manifest file for your custom Teams app (i.e., the Custom Agent) needs to be uploaded in your tenant, and access is governed via the Teams Admin Center. This isn’t about controlling the endpoint, but rather about limiting who can access the app. You can restrict access on a per-user or per-group basis, effectively preventing malicious users inside your organization from using the app. However, you cannot restrict access at the endpoint level, nor could you prevent a malicious external organization from copying the app package. This limitation may pose a concern, especially when thinking about endpoint-level security outside your tenant’s control. 2. Controlling the Azure Bot Service Endpoint The Azure Bot Service endpoint acts as a bridge between the Teams Channel and your pro-code backend. Here, the only available security configuration is to specify the Service Principal that the agent uses. There isn’t much room for granular control here—it’s essentially a relay point managed by Azure Bot Service, and protection depends largely on how you secure the endpoints it connects to. 3. Controlling the ASP.NET Core Endpoint This is where endpoint protection becomes critical. When you configure your bot in Azure Bot Service, you must expose your pro-code endpoint to the public internet. In our earlier article, we used a local devtunnel for development. But in production, you’ll likely use Azure App Service or others, which results in a publicly accessible endpoint. While Microsoft provides documentation on network isolation options for Azure Bot Service, these are currently only supported when using the Direct Line channel - not the Teams channel. This means that when using Teams as the entry point, you cannot isolate the backend endpoint via a private network, making it critical to implement other security measures at the app level (e.g., token validation, IP restrictions, mutual TLS, etc.). https://learn.microsoft.com/en-us/azure/bot-service/dl-network-isolation-concept?view=azure-bot-service-4.0 Let’s Review Other Articles on this Topic There are several valuable resources that describe this topic. Since Microsoft Teams is a SaaS application, the bot endpoint (e.g., https://my-webapp-endpoint.net/api/messages) must be publicly accessible when integrated through the Teams channel. Is it possible to integrate Azure Bot with Teams without public access? How to create Azure Bot Service in a private network? In particular, this article provides an excellent deep dive into the traffic flow between Teams and Azure Bot Service: Azure Bot Service, Microsoft Teams architecture, and message flow In the section titled “Challenge 2: Network isolation vs. Teams connectivity,” the article clearly explains why network-level isolation is fundamentally incompatible with the Teams channel. The article also outlines a practical security approach using Azure Firewall, NSG (Network Security Groups), and JWT token validation at the application level. If you're using the Teams channel, complete network isolation is not feasible—which makes sense, given that Teams itself is a SaaS platform and cannot be brought into your private network. As a result, protecting the backend bot (e.g., the ASP.NET Core endpoint) will require application-level controls, particularly JWT token validation to ensure that only trusted sources can invoke the bot. Let’s now take a closer look at how to implement that in C#. Controlling Endpoints in the ASP.NET Core Application So, what does endpoint control look like at the application level? Let’s return to the ASP.NET Core side of things and take a closer look at the default project structure. If you recall, the Program.cs in the template project contains a specific line worth revisiting. This configuration plays an important role in how the application handles and secures incoming requests. Let’s take a look at that setup. // Register the WeatherForecastAgent builder.Services.AddTransient<WeatherForecastAgent>(); // Add AspNet token validation - ** HERE ** builder.Services.AddBotAspNetAuthentication(builder.Configuration); // Register IStorage. For development, MemoryStorage is suitable. // For production Agents, persisted storage should be used so // that state survives Agent restarts, and operate correctly // in a cluster of Agent instances. builder.Services.AddSingleton<IStorage, MemoryStorage>(); As it turns out, the AddBotAspNetAuthentication method referenced earlier in Program.cs is actually defined in the same project, within a file named AspNetExtensions.cs. This method is where access token validation is implemented and enforced. Let’s take a closer look at a key portion of the AddBotAspNetAuthentication method from AspNetExtensions.cs: public static void AddBotAspNetAuthentication(this IServiceCollection services, IConfiguration configuration, string tokenValidationSectionName = "TokenValidation", ILogger logger = null) { IConfigurationSection tokenValidationSection = configuration.GetSection(tokenValidationSectionName); List<string> validTokenIssuers = tokenValidationSection.GetSection("ValidIssuers").Get<List<string>>(); List<string> audiences = tokenValidationSection.GetSection("Audiences").Get<List<string>>(); if (!tokenValidationSection.Exists()) { logger?.LogError("Missing configuration section '{tokenValidationSectionName}'. This section is required to be present in appsettings.json",tokenValidationSectionName); throw new InvalidOperationException($"Missing configuration section '{tokenValidationSectionName}'. This section is required to be present in appsettings.json"); } // If ValidIssuers is empty, default for ABS Public Cloud if (validTokenIssuers == null || validTokenIssuers.Count == 0) { validTokenIssuers = [ "https://api.botframework.com", "https://sts.windows.net/d6d49420-f39b-4df7-a1dc-d59a935871db/", "https://login.microsoftonline.com/d6d49420-f39b-4df7-a1dc-d59a935871db/v2.0", "https://sts.windows.net/f8cdef31-a31e-4b4a-93e4-5f571e91255a/", "https://login.microsoftonline.com/f8cdef31-a31e-4b4a-93e4-5f571e91255a/v2.0", "https://sts.windows.net/69e9b82d-4842-4902-8d1e-abc5b98a55e8/", "https://login.microsoftonline.com/69e9b82d-4842-4902-8d1e-abc5b98a55e8/v2.0", ]; string tenantId = tokenValidationSection["TenantId"]; if (!string.IsNullOrEmpty(tenantId)) { validTokenIssuers.Add(string.Format(CultureInfo.InvariantCulture, AuthenticationConstants.ValidTokenIssuerUrlTemplateV1, tenantId)); validTokenIssuers.Add(string.Format(CultureInfo.InvariantCulture, AuthenticationConstants.ValidTokenIssuerUrlTemplateV2, tenantId)); } } if (audiences == null || audiences.Count == 0) { throw new ArgumentException($"{tokenValidationSectionName}:Audiences requires at least one value"); } bool isGov = tokenValidationSection.GetValue("IsGov", false); bool azureBotServiceTokenHandling = tokenValidationSection.GetValue("AzureBotServiceTokenHandling", true); // If the `AzureBotServiceOpenIdMetadataUrl` setting is not specified, use the default based on `IsGov`. This is what is used to authenticate ABS tokens. string azureBotServiceOpenIdMetadataUrl = tokenValidationSection["AzureBotServiceOpenIdMetadataUrl"]; if (string.IsNullOrEmpty(azureBotServiceOpenIdMetadataUrl)) { azureBotServiceOpenIdMetadataUrl = isGov ? AuthenticationConstants.GovAzureBotServiceOpenIdMetadataUrl : AuthenticationConstants.PublicAzureBotServiceOpenIdMetadataUrl; } // If the `OpenIdMetadataUrl` setting is not specified, use the default based on `IsGov`. This is what is used to authenticate Entra ID tokens. string openIdMetadataUrl = tokenValidationSection["OpenIdMetadataUrl"]; if (string.IsNullOrEmpty(openIdMetadataUrl)) { openIdMetadataUrl = isGov ? AuthenticationConstants.GovOpenIdMetadataUrl : AuthenticationConstants.PublicOpenIdMetadataUrl; } TimeSpan openIdRefreshInterval = tokenValidationSection.GetValue("OpenIdMetadataRefresh", BaseConfigurationManager.DefaultAutomaticRefreshInterval); _ = services.AddAuthentication(options => { options.DefaultAuthenticateScheme = JwtBearerDefaults.AuthenticationScheme; options.DefaultChallengeScheme = JwtBearerDefaults.AuthenticationScheme; }) .AddJwtBearer(options => { options.SaveToken = true; options.TokenValidationParameters = new TokenValidationParameters { ValidateIssuer = true, ValidateAudience = true, ValidateLifetime = true, ClockSkew = TimeSpan.FromMinutes(5), ValidIssuers = validTokenIssuers, ValidAudiences = audiences, ValidateIssuerSigningKey = true, RequireSignedTokens = true, }; // Using Microsoft.IdentityModel.Validators options.TokenValidationParameters.EnableAadSigningKeyIssuerValidation(); options.Events = new JwtBearerEvents { // Create a ConfigurationManager based on the requestor. This is to handle ABS non-Entra tokens. OnMessageReceived = async context => { string authorizationHeader = context.Request.Headers.Authorization.ToString(); if (string.IsNullOrEmpty(authorizationHeader)) { // Default to AadTokenValidation handling context.Options.TokenValidationParameters.ConfigurationManager ??= options.ConfigurationManager as BaseConfigurationManager; await Task.CompletedTask.ConfigureAwait(false); return; } string[] parts = authorizationHeader?.Split(' '); if (parts.Length != 2 || parts[0] != "Bearer") { // Default to AadTokenValidation handling context.Options.TokenValidationParameters.ConfigurationManager ??= options.ConfigurationManager as BaseConfigurationManager; await Task.CompletedTask.ConfigureAwait(false); return; } JwtSecurityToken token = new(parts[1]); string issuer = token.Claims.FirstOrDefault(claim => claim.Type == AuthenticationConstants.IssuerClaim)?.Value; if (azureBotServiceTokenHandling && AuthenticationConstants.BotFrameworkTokenIssuer.Equals(issuer)) { // Use the Bot Framework authority for this configuration manager context.Options.TokenValidationParameters.ConfigurationManager = _openIdMetadataCache.GetOrAdd(azureBotServiceOpenIdMetadataUrl, key => { return new ConfigurationManager<OpenIdConnectConfiguration>(azureBotServiceOpenIdMetadataUrl, new OpenIdConnectConfigurationRetriever(), new HttpClient()) { AutomaticRefreshInterval = openIdRefreshInterval }; }); } else { context.Options.TokenValidationParameters.ConfigurationManager = _openIdMetadataCache.GetOrAdd(openIdMetadataUrl, key => { return new ConfigurationManager<OpenIdConnectConfiguration>(openIdMetadataUrl, new OpenIdConnectConfigurationRetriever(), new HttpClient()) { AutomaticRefreshInterval = openIdRefreshInterval }; }); } await Task.CompletedTask.ConfigureAwait(false); }, OnTokenValidated = context => { logger?.LogDebug("TOKEN Validated"); return Task.CompletedTask; }, OnForbidden = context => { logger?.LogWarning("Forbidden: {m}", context.Result.ToString()); return Task.CompletedTask; }, OnAuthenticationFailed = context => { logger?.LogWarning("Auth Failed {m}", context.Exception.ToString()); return Task.CompletedTask; } }; }); } From examining the code, we can see that it reads configuration settings from the appsettings.{your-env}.json file and uses them during token validation. In particular, the following line stands out: TokenValidationParameters.ValidAudiences = audiences; This ensures that only tokens issued for the configured audience (i.e., your Azure Bot Service's Service Principal) will be accepted. Any requests carrying tokens with mismatched audiences will be rejected during validation. One critical observation is that if no access token is provided at all, the code effectively lets the request through without enforcing validation. This means that if the Service Principal is misconfigured or lacks proper permissions, and therefore no token is issued with the request, the bot may still continue processing it without rejecting the request. This could potentially create a security loophole, especially if the backend API is publicly accessible. OnMessageReceived = async context => { string authorizationHeader = context.Request.Headers.Authorization.ToString(); if (string.IsNullOrEmpty(authorizationHeader)) { // Default to AadTokenValidation handling context.Options.TokenValidationParameters.ConfigurationManager ??= options.ConfigurationManager as BaseConfigurationManager; await Task.CompletedTask.ConfigureAwait(false); return; } Additional Security Concerns and Improvements Another point worth noting about the current code is that the Custom Engine Agent app can be copied and uploaded to a different Entra ID tenant, and it would still work. (Admittedly, this might be intentional since the architecture assumes providing Custom Engine Agent services to multiple organizations.) The project template and Teams settings raise two key security concerns that we should address: Reject requests when the token is missing - token should not be empty. Block access from unknown or unauthorized Entra ID tenants. To enforce the above, you will need to update the Service Principal configuration accordingly. Specifically, open the Service Principal's API permissions tab and add the following permission: User.Read.All Without this permission, access tokens will not be issued, making token validation impossible. After updating the Service Principal permissions, run your ASP.NET Core app and set a breakpoint around the following code to inspect the contents of the token included in the Authorization header. This will help you verify whether the token is correctly issued and contains the expected claims. string authorizationHeader = context.Request.Headers.Authorization.ToString(); The token is Base64 encoded, so let’s decode it to inspect its contents. I asked Copilot to help us decode the token so we can better understand the claims and data included inside. Let's inspect the token contents. After decoding the token (some parts are redacted for privacy), we can see that: The aud (audience) claim contains the Service Principal’s client ID. The serviceurl claim includes the Entra ID tenant ID. I attempted to configure the authorization settings to include the Tenant ID directly in the access token claims, but was not successful this time. Below is a sample code snippet that implements of the following requirements: Reject requests with an empty or missing token. Deny access from unknown Entra ID tenants. This is the sample code for "1. Reject requests with an empty or missing token". I’ve added comments in the code to clearly indicate what was changed. public static void AddBotAspNetAuthentication(this IServiceCollection services, IConfiguration configuration, string tokenValidationSectionName = "TokenValidation", ILogger logger = null) { IConfigurationSection tokenValidationSection = configuration.GetSection(tokenValidationSectionName); List<string> validTokenIssuers = tokenValidationSection.GetSection("ValidIssuers").Get<List<string>>(); List<string> audiences = tokenValidationSection.GetSection("Audiences").Get<List<string>>(); if (!tokenValidationSection.Exists()) { logger?.LogError("Missing configuration section '{tokenValidationSectionName}'. This section is required to be present in appsettings.json",tokenValidationSectionName); throw new InvalidOperationException($"Missing configuration section '{tokenValidationSectionName}'. This section is required to be present in appsettings.json"); } // If ValidIssuers is empty, default for ABS Public Cloud if (validTokenIssuers == null || validTokenIssuers.Count == 0) { validTokenIssuers = [ "https://api.botframework.com", "https://sts.windows.net/d6d49420-f39b-4df7-a1dc-d59a935871db/", "https://login.microsoftonline.com/d6d49420-f39b-4df7-a1dc-d59a935871db/v2.0", "https://sts.windows.net/f8cdef31-a31e-4b4a-93e4-5f571e91255a/", "https://login.microsoftonline.com/f8cdef31-a31e-4b4a-93e4-5f571e91255a/v2.0", "https://sts.windows.net/69e9b82d-4842-4902-8d1e-abc5b98a55e8/", "https://login.microsoftonline.com/69e9b82d-4842-4902-8d1e-abc5b98a55e8/v2.0", ]; string tenantId = tokenValidationSection["TenantId"]; if (!string.IsNullOrEmpty(tenantId)) { validTokenIssuers.Add(string.Format(CultureInfo.InvariantCulture, AuthenticationConstants.ValidTokenIssuerUrlTemplateV1, tenantId)); validTokenIssuers.Add(string.Format(CultureInfo.InvariantCulture, AuthenticationConstants.ValidTokenIssuerUrlTemplateV2, tenantId)); } } if (audiences == null || audiences.Count == 0) { throw new ArgumentException($"{tokenValidationSectionName}:Audiences requires at least one value"); } bool isGov = tokenValidationSection.GetValue("IsGov", false); bool azureBotServiceTokenHandling = tokenValidationSection.GetValue("AzureBotServiceTokenHandling", true); // If the `AzureBotServiceOpenIdMetadataUrl` setting is not specified, use the default based on `IsGov`. This is what is used to authenticate ABS tokens. string azureBotServiceOpenIdMetadataUrl = tokenValidationSection["AzureBotServiceOpenIdMetadataUrl"]; if (string.IsNullOrEmpty(azureBotServiceOpenIdMetadataUrl)) { azureBotServiceOpenIdMetadataUrl = isGov ? AuthenticationConstants.GovAzureBotServiceOpenIdMetadataUrl : AuthenticationConstants.PublicAzureBotServiceOpenIdMetadataUrl; } // If the `OpenIdMetadataUrl` setting is not specified, use the default based on `IsGov`. This is what is used to authenticate Entra ID tokens. string openIdMetadataUrl = tokenValidationSection["OpenIdMetadataUrl"]; if (string.IsNullOrEmpty(openIdMetadataUrl)) { openIdMetadataUrl = isGov ? AuthenticationConstants.GovOpenIdMetadataUrl : AuthenticationConstants.PublicOpenIdMetadataUrl; } TimeSpan openIdRefreshInterval = tokenValidationSection.GetValue("OpenIdMetadataRefresh", BaseConfigurationManager.DefaultAutomaticRefreshInterval); _ = services.AddAuthentication(options => { options.DefaultAuthenticateScheme = JwtBearerDefaults.AuthenticationScheme; options.DefaultChallengeScheme = JwtBearerDefaults.AuthenticationScheme; }) .AddJwtBearer(options => { options.SaveToken = true; options.TokenValidationParameters = new TokenValidationParameters { ValidateIssuer = true, ValidateAudience = true, // this option enables to validate the audience claim with audiences values ValidateLifetime = true, ClockSkew = TimeSpan.FromMinutes(5), ValidIssuers = validTokenIssuers, ValidAudiences = audiences, ValidateIssuerSigningKey = true, RequireSignedTokens = true, }; // Using Microsoft.IdentityModel.Validators options.TokenValidationParameters.EnableAadSigningKeyIssuerValidation(); options.Events = new JwtBearerEvents { // Create a ConfigurationManager based on the requestor. This is to handle ABS non-Entra tokens. OnMessageReceived = async context => { string authorizationHeader = context.Request.Headers.Authorization.ToString(); if (string.IsNullOrEmpty(authorizationHeader)) { // Default to AadTokenValidation handling // context.Options.TokenValidationParameters.ConfigurationManager ??= options.ConfigurationManager as BaseConfigurationManager; // await Task.CompletedTask.ConfigureAwait(false); // return; // // Fail the request when the token is empty context.Fail("Authorization header is missing."); logger?.LogWarning("Authorization header is missing."); return; } string[] parts = authorizationHeader?.Split(' '); if (parts.Length != 2 || parts[0] != "Bearer") { // Default to AadTokenValidation handling context.Options.TokenValidationParameters.ConfigurationManager ??= options.ConfigurationManager as BaseConfigurationManager; await Task.CompletedTask.ConfigureAwait(false); return; } Next, we should implement about "2. Deny access from unknown Entra ID tenants" We can retrieve the Tenant ID inside the MessageActivityAsync method of Bot/WeatherAgentBot.cs. Let’s extend the logic by referring to the following sample code to capture and utilize the Tenant ID within that method. https://github.com/OfficeDev/microsoft-teams-apps-company-communicator/blob/dcf3b169084d3fff7c1e4c5b68718fb33c3391dd/Source/CompanyCommunicator/Bot/CompanyCommunicatorBotFilterMiddleware.cs#L44 Here is how you can extend the logic to retrieve and use the Tenant ID within the MessageActivityAsync method: using MyM365Agent1.Bot.Agents; using Microsoft.Agents.Builder; using Microsoft.Agents.Builder.App; using Microsoft.Agents.Builder.State; using Microsoft.Agents.Core.Models; using Microsoft.SemanticKernel; using Microsoft.SemanticKernel.ChatCompletion; using Microsoft.Extensions.DependencyInjection.Extensions; namespace MyM365Agent1.Bot; public class WeatherAgentBot : AgentApplication { private WeatherForecastAgent _weatherAgent; private Kernel _kernel; private readonly string _tenantId; private readonly ILogger<WeatherAgentBot> _logger; public WeatherAgentBot(AgentApplicationOptions options, Kernel kernel, IConfiguration configuration, ILogger<WeatherAgentBot> logger) : base(options) { _kernel = kernel ?? throw new ArgumentNullException(nameof(kernel)); _logger = logger ?? throw new ArgumentNullException(nameof(logger)); OnConversationUpdate(ConversationUpdateEvents.MembersAdded, WelcomeMessageAsync); OnActivity(ActivityTypes.Message, MessageActivityAsync, rank: RouteRank.Last); // Get TenantId from TokenValidation section var tokenValidationSection = configuration.GetSection("TokenValidation"); _tenantId = tokenValidationSection["TenantId"]; } protected async Task MessageActivityAsync(ITurnContext turnContext, ITurnState turnState, CancellationToken cancellationToken) { // add validation of tenant ID var activity = turnContext.Activity; // Log: Received activity _logger.LogInformation("Received message activity from: {FromId}, AadObjectId:{AadObjectId}, TenantId: {TenantId}, ChannelId: {ChannelId}, ConversationType: {ConversationType}", activity?.From?.Id, activity?.From?.AadObjectId, activity?.Conversation?.TenantId, activity?.ChannelId, activity?.Conversation?.ConversationType); if (activity.ChannelId != "msteams" // Ignore messages not from Teams || activity.Conversation?.ConversationType?.ToLowerInvariant() != "personal" // Ignore messages from team channels or group chats || string.IsNullOrEmpty(activity.From?.AadObjectId) // Ignore if not an AAD user (e.g., bots, guest users) || (!string.IsNullOrEmpty(_tenantId) && !string.Equals(activity.Conversation?.TenantId, _tenantId, StringComparison.OrdinalIgnoreCase))) // Ignore if tenant ID does not match { _logger.LogWarning("Unauthorized serviceUrl detected: {ServiceUrl}. Expected to contain TenantId: {TenantId}", activity?.ServiceUrl, _tenantId); await turnContext.SendActivityAsync("Unauthorized service URL.", cancellationToken: cancellationToken); return; } // Setup local service connection ServiceCollection serviceCollection = [ new ServiceDescriptor(typeof(ITurnState), turnState), new ServiceDescriptor(typeof(ITurnContext), turnContext), new ServiceDescriptor(typeof(Kernel), _kernel), ]; // Start a Streaming Process await turnContext.StreamingResponse.QueueInformativeUpdateAsync("Working on a response for you"); ChatHistory chatHistory = turnState.GetValue("conversation.chatHistory", () => new ChatHistory()); _weatherAgent = new WeatherForecastAgent(_kernel, serviceCollection.BuildServiceProvider()); // Invoke the WeatherForecastAgent to process the message WeatherForecastAgentResponse forecastResponse = await _weatherAgent.InvokeAgentAsync(turnContext.Activity.Text, chatHistory); if (forecastResponse == null) { turnContext.StreamingResponse.QueueTextChunk("Sorry, I couldn't get the weather forecast at the moment."); await turnContext.StreamingResponse.EndStreamAsync(cancellationToken); return; } // Create a response message based on the response content type from the WeatherForecastAgent // Send the response message back to the user. switch (forecastResponse.ContentType) { case WeatherForecastAgentResponseContentType.Text: turnContext.StreamingResponse.QueueTextChunk(forecastResponse.Content); break; case WeatherForecastAgentResponseContentType.AdaptiveCard: turnContext.StreamingResponse.FinalMessage = MessageFactory.Attachment(new Attachment() { ContentType = "application/vnd.microsoft.card.adaptive", Content = forecastResponse.Content, }); break; default: break; } await turnContext.StreamingResponse.EndStreamAsync(cancellationToken); // End the streaming response } protected async Task WelcomeMessageAsync(ITurnContext turnContext, ITurnState turnState, CancellationToken cancellationToken) { foreach (ChannelAccount member in turnContext.Activity.MembersAdded) { if (member.Id != turnContext.Activity.Recipient.Id) { await turnContext.SendActivityAsync(MessageFactory.Text("Hello and Welcome! I'm here to help with all your weather forecast needs!"), cancellationToken); } } } } Since we’re at it, I’ve added various validations as well. I hope this will be helpful as a reference for everyone.422Views6likes0CommentsImportant Changes to App Service Managed Certificates: Is Your Certificate Affected?
Overview As part of an upcoming industry-wide change, DigiCert, the Certificate Authority (CA) for Azure App Service Managed Certificates (ASMC), is required to migrate to a new validation platform to meet multi-perspective issuance corroboration (MPIC) requirements. While most certificates will not be impacted by this change, certain site configurations and setups may prevent certificate issuance or renewal starting July 28, 2025. Update (August 5, 2025) We’ve published a Microsoft Learn documentation titled App Service Managed Certificate (ASMC) changes – July 28, 2025 that contains more in-depth mitigation guidance and a growing FAQ section to support the changes outlined in this blog post. While the blog currently contains the most complete overview, the documentation will soon be updated to reflect all blog content. Going forward, any new information or clarifications will be added to the documentation page, so we recommend bookmarking it for the latest guidance. What Will the Change Look Like? For most customers: No disruption. Certificate issuance and renewals will continue as expected for eligible site configurations. For impacted scenarios: Certificate requests will fail (no certificate issued) starting July 28, 2025, if your site configuration is not supported. Existing certificates will remain valid until their expiration (up to six months after last renewal). Impacted Scenarios You will be affected by this change if any of the following apply to your site configurations: Your site is not publicly accessible: Public accessibility to your app is required. If your app is only accessible privately (e.g., requiring a client certificate for access, disabling public network access, using private endpoints or IP restrictions), you will not be able to create or renew a managed certificate. Other site configurations or setup methods not explicitly listed here that restrict public access, such as firewalls, authentication gateways, or any custom access policies, can also impact eligibility for managed certificate issuance or renewal. Action: Ensure your app is accessible from the public internet. However, if you need to limit access to your app, then you must acquire your own SSL certificate and add it to your site. Your site uses Azure Traffic Manager "nested" or "external" endpoints: Only “Azure Endpoints” on Traffic Manager will be supported for certificate creation and renewal. “Nested endpoints” and “External endpoints” will not be supported. Action: Transition to using "Azure Endpoints". However, if you cannot, then you must obtain a different SSL certificate for your domain and add it to your site. Your site relies on *.trafficmanager.net domain: Certificates for *.trafficmanager.net domains will not be supported for creation or renewal. Action: Add a custom domain to your app and point the custom domain to your *.trafficmanager.net domain. After that, secure the custom domain with a new SSL certificate. If none of the above applies, no further action is required. How to Identify Impacted Resources? To assist with the upcoming changes, you can use Azure Resource Graph (ARG) queries to help identify resources that may be affected under each scenario. Please note that these queries are provided as a starting point and may not capture every configuration. Review your environment for any unique setups or custom configurations. Scenario 1: Sites Not Publicly Accessible This ARG query retrieves a list of sites that either have the public network access property disabled or are configured to use client certificates. It then filters for sites that are using App Service Managed Certificates (ASMC) for their custom hostname SSL bindings. These certificates are the ones that could be affected by the upcoming changes. However, please note that this query does not provide complete coverage, as there may be additional configurations impacting public access to your app that are not included here. Ultimately, this query serves as a helpful guide for users, but a thorough review of your environment is recommended. You can copy this query, paste it into Azure Resource Graph Explorer, and then click "Run query" to view the results for your environment. // ARG Query: Identify App Service sites that commonly restrict public access and use ASMC for custom hostname SSL bindings resources | where type == "microsoft.web/sites" // Extract relevant properties for public access and client certificate settings | extend publicNetworkAccess = tolower(tostring(properties.publicNetworkAccess)), clientCertEnabled = tolower(tostring(properties.clientCertEnabled)) // Filter for sites that either have public network access disabled // or have client certificates enabled (both can restrict public access) | where publicNetworkAccess == "disabled" or clientCertEnabled != "false" // Expand the list of SSL bindings for each site | mv-expand hostNameSslState = properties.hostNameSslStates | extend hostName = tostring(hostNameSslState.name), thumbprint = tostring(hostNameSslState.thumbprint) // Only consider custom domains (exclude default *.azurewebsites.net) and sites with an SSL certificate bound | where tolower(hostName) !endswith "azurewebsites.net" and isnotempty(thumbprint) // Select key site properties for output | project siteName = name, siteId = id, siteResourceGroup = resourceGroup, thumbprint, publicNetworkAccess, clientCertEnabled // Join with certificates to find only those using App Service Managed Certificates (ASMC) // ASMCs are identified by the presence of the "canonicalName" property | join kind=inner ( resources | where type == "microsoft.web/certificates" | extend certThumbprint = tostring(properties.thumbprint), canonicalName = tostring(properties.canonicalName) // Only ASMC uses the "canonicalName" property | where isnotempty(canonicalName) | project certName = name, certId = id, certResourceGroup = tostring(properties.resourceGroup), certExpiration = properties.expirationDate, certThumbprint, canonicalName ) on $left.thumbprint == $right.certThumbprint // Final output: sites with restricted public access and using ASMC for custom hostname SSL bindings | project siteName, siteId, siteResourceGroup, publicNetworkAccess, clientCertEnabled, thumbprint, certName, certId, certResourceGroup, certExpiration, canonicalName Scenario 2: Traffic Manager Endpoint Types For this scenario, please manually review your Traffic Manager profile configurations to ensure only “Azure Endpoints” are in use. We recommend inspecting your Traffic Manager profiles directly in the Azure portal or using relevant APIs to confirm your setup and ensure compliance with the new requirements. Scenario 3: Certificates Issued to *.trafficmanager.net Domains This ARG query helps you identify App Service Managed Certificates (ASMC) that were issued to *.trafficmanager.net domains. In addition, it also checks whether any web apps are currently using those certificates for custom domain SSL bindings. You can copy this query, paste it into Azure Resource Graph Explorer, and then click "Run query" to view the results for your environment. // ARG Query: Identify App Service Managed Certificates (ASMC) issued to *.trafficmanager.net domains // Also checks if any web apps are currently using those certificates for custom domain SSL bindings resources | where type == "microsoft.web/certificates" // Extract the certificate thumbprint and canonicalName (ASMCs have a canonicalName property) | extend certThumbprint = tostring(properties.thumbprint), canonicalName = tostring(properties.canonicalName) // Only ASMC uses the "canonicalName" property // Filter for certificates issued to *.trafficmanager.net domains | where canonicalName endswith "trafficmanager.net" // Select key certificate properties for output | project certName = name, certId = id, certResourceGroup = tostring(properties.resourceGroup), certExpiration = properties.expirationDate, certThumbprint, canonicalName // Join with web apps to see if any are using these certificates for SSL bindings | join kind=leftouter ( resources | where type == "microsoft.web/sites" // Expand the list of SSL bindings for each site | mv-expand hostNameSslState = properties.hostNameSslStates | extend hostName = tostring(hostNameSslState.name), thumbprint = tostring(hostNameSslState.thumbprint) // Only consider bindings for *.trafficmanager.net custom domains with a certificate bound | where tolower(hostName) endswith "trafficmanager.net" and isnotempty(thumbprint) // Select key site properties for output | project siteName = name, siteId = id, siteResourceGroup = resourceGroup, thumbprint ) on $left.certThumbprint == $right.thumbprint // Final output: ASMCs for *.trafficmanager.net domains and any web apps using them | project certName, certId, certResourceGroup, certExpiration, canonicalName, siteName, siteId, siteResourceGroup Ongoing Updates We will continue to update this post with any new queries or important changes as they become available. Be sure to check back for the latest information. Note on Comments We hope this information helps you navigate the upcoming changes. To keep this post clear and focused, comments are closed. If you have questions, need help, or want to share tips or alternative detection methods, please visit our official support channels or the Microsoft Q&A, where our team and the community can assist you.21KViews1like1CommentStrategic Solutions for Seamless Integration of Third-Party SaaS
Modern systems must be modular and interoperable by design. Integration is no longer a feature, it’s a requirement. Developers are expected to build architectures that connect easily with third-party platforms, but too often, core systems are designed in isolation. This disconnect creates friction for downstream teams and slows delivery. At Microsoft, SaaS platforms like SAP SuccessFactors and Eightfold support Talent Acquisition by handling functions such as requisition tracking, application workflows, and interview coordination. These tools help reduce costs and free up engineering focus for high-priority areas like Azure and AI. The real challenge is integrating them with internal systems such as Demand Planning, Offer Management, and Employee Central. This blog post outlines a strategy centered around two foundational components: an Integration and Orchestration Layer, and a Messaging Platform. Together, these enable real-time communication, consistent data models, and scalable integration. While Talent Acquisition is the use case here, the architectural patterns apply broadly across domains. Whether you're embedding AI pipelines, managing edge deployments, or building platform services, thoughtful integration needs to be built into the foundation, not bolted on later.Quest 9: I want to use a ready-made template
Building robust, scalable AI apps is tough, especially when you want to move fast, follow best practices, and avoid being bogged down by endless setup and configuration. In this quest, you’ll discover how to accelerate your journey from prototype to production by leveraging ready-made templates and modern cloud tools. Say goodbye to decision fatigue and hello to streamlined, industry-approved workflows you can make your own. 👉 Want to catch up on the full program or grab more quests? https://aka.ms/JSAIBuildathon 💬 Got questions or want to hang with other builders? Join us on Discord — head to the #js-ai-build-a-thon channel. 🚀 What You’ll Build A fully functional AI application deployed on Azure, customized to solve a real problem that matters to you. A codebase powered by a production-grade template, complete with all the necessary infrastructure-as-code, deployment scripts, and best practices already baked in. Your own proof-of-concept or MVP, ready to scale or show off to the world. 🛠️ What You Need ✅ GitHub account ✅ Visual Studio Code ✅ Node.js ✅ Azure subscription (free trials and student credits available) ✅ Azure Developer CLI (azd) ✅ The curiosity to solve a meaningful problem! 🧩 Concepts You’ll Explore Azure Developer CLI (azd) Learn how azd, the developer-first command-line tool, simplifies authentication, setup, deployment, and teardown for Azure apps. With intuitive commands like azd up and azd deploy, you can go from zero to running in the cloud no deep cloud expertise required. Production-Ready Templates Explore a gallery of customizable templates designed to get your app up and running fast. These templates aren’t just “hello world” they feature scalable architectures, sample code, and reusable infrastructure assets to launch everything from chatbots to RAG apps to full-stack solutions. Infrastructure as Code (IaC) See how every template bundle configuration files and scripts to automatically provision the cloud resources you need. You’ll get a taste of how top teams ship secure, repeatable, and maintainable systems without manually clicking through Azure dashboards. Best Practices by Default Templates incorporate industry best practices for code structure, deployment, and scalability. You’ll spend less time researching how to “do it right” and more time customizing your application to fit your unique use case. Customization for Real-World Problems Pick a template and make it yours! Whether you’re building a copilot, a chat-enabled app, or a serverless API, you’ll learn how to tweak the frontend, swap out backend logic, connect your own data sources, and shape the solution to solve a real-world problem you care about. 🌟 Bonus Resources Here are some additional resources to help you learn more about the Azure Developer CLI (azd) and the templates available: Kickstart JS/TS projects with azd Templates Kickstart your JavaScript projects with azd on YouTube ⏭️ What next? With production-ready templates and the Azure Developer CLI at your side, you’re ready to move from “just an idea” to a deployable, scalable solution without reinventing the wheel. Start with the right foundation, customize with confidence, and ship your next AI app like a pro! Once you have your project done, ensure you submit to GitHub - Azure-Samples/JS-AI-Build-a-thonSuperfast using Web App and Managed Identity to invoke Function App triggers
TOC Introduction Setup References 1. Introduction Many enterprises prefer not to use App Keys to invoke Function App triggers, as they are concerned that these fixed strings might be exposed. This method allows you to invoke Function App triggers using Managed Identity for enhanced security. I will provide examples in both Bash and Node.js. 2. Setup 1. Create a Linux Python 3.11 Function App 1.1. Configure Authentication to block unauthenticated callers while allowing the Web App’s Managed Identity to authenticate. Identity Provider Microsoft Choose a tenant for your application and it's users Workforce Configuration App registration type Create Name [automatically generated] Client Secret expiration [fit-in your business purpose] Supported Account Type Any Microsoft Entra Directory - Multi-Tenant Client application requirement Allow requests from any application Identity requirement Allow requests from any identity Tenant requirement Use default restrictions based on issuer Token store [checked] 1.2. Create an anonymous trigger. Since your app is already protected by App Registration, additional Function App-level protection is unnecessary; otherwise, you will need a Function Key to trigger it. 1.3. Once the Function App is configured, try accessing the endpoint directly—you should receive a 401 Unauthorized error, confirming that triggers cannot be accessed without proper Managed Identity authorization. 1.4. After making these changes, wait 10 minutes for the settings to take effect. 2. Create a Linux Node.js 20 Web App and Obtain an Access Token and Invoke the Function App Trigger Using Web App (Bash Example) 2.1. Enable System Assigned Managed Identity in the Web App settings. 2.2. Open Kudu SSH Console for the Web App. 2.3. Run the following commands, making the necessary modifications: subscriptionsID → Replace with your Subscription ID. resourceGroupsID → Replace with your Resource Group ID. application_id_uri → Replace with the Application ID URI from your Function App’s App Registration. https://az-9640-faapp.azurewebsites.net/api/test_trigger → Replace with the corresponding Function App trigger URL. # Please setup the target resource to yours subscriptionsID="01d39075-XXXX-XXXX-XXXX-XXXXXXXXXXXX" resourceGroupsID="XXXX" # Variable Setting (No need to change) identityEndpoint="$IDENTITY_ENDPOINT" identityHeader="$IDENTITY_HEADER" application_id_uri="api://9c0012ad-XXXX-XXXX-XXXX-XXXXXXXXXXXX" # Install necessary tool apt install -y jq # Get Access Token tokenUri="${identityEndpoint}?resource=${application_id_uri}&api-version=2019-08-01" accessToken=$(curl -s -H "Metadata: true" -H "X-IDENTITY-HEADER: $identityHeader" "$tokenUri" | jq -r '.access_token') echo "Access Token: $accessToken" # Run Trigger response=$(curl -s -o response.json -w "%{http_code}" -X GET "https://az-9640-myfa.azurewebsites.net/api/my_test_trigger" -H "Authorization: Bearer $accessToken") echo "HTTP Status Code: $response" echo "Response Body:" cat response.json 2.4. If everything is set up correctly, you should see a successful invocation result. 3. Invoke the Function App Trigger Using Web App (nodejs Example) I have also provide my example, which you can modify accordingly and save it to /home/site/wwwroot/callFunctionApp.js and run it cd /home/site/wwwroot/ vi callFunctionApp.js npm init -y npm install azure/identity axios node callFunctionApp.js // callFunctionApp.js const { DefaultAzureCredential } = require("@azure/identity"); const axios = require("axios"); async function callFunctionApp() { try { const applicationIdUri = "api://9c0012ad-XXXX-XXXX-XXXX-XXXXXXXXXXXX"; // Change here const credential = new DefaultAzureCredential(); console.log("Requesting token..."); const tokenResponse = await credential.getToken(applicationIdUri); if (!tokenResponse || !tokenResponse.token) { throw new Error("Failed to acquire access token"); } const accessToken = tokenResponse.token; console.log("Token acquired:", accessToken); const apiUrl = "https://az-9640-myfa.azurewebsites.net/api/my_test_trigger"; // Change here console.log("Calling the API now..."); const response = await axios.get(apiUrl, { headers: { Authorization: `Bearer ${accessToken}`, }, }); console.log("HTTP Status Code:", response.status); console.log("Response Body:", response.data); } catch (error) { console.error("Failed to call the function", error.response ? error.response.data : error.message); } } callFunctionApp(); Below is my execution result: 3. References Tutorial: Managed Identity to Invoke Azure Functions | Microsoft Learn How to Invoke Azure Function App with Managed Identity | by Krizzia 🤖 | Medium Configure Microsoft Entra authentication - Azure App Service | Microsoft Learn844Views1like2CommentsValidating Change Requests with Kubernetes Admission Controllers
Promoting an application or infrastructure change into production often comes with a requirement to follow a change control process. This ensures that changes to production are properly reviewed and that they adhere to required approvals, change windows and QA process. Often this change request (CR) process will be conducted using a system for recording and auditing the change request and the outcome. When deploying a release, there will often be places in the process to go through this change control workflow. This may be as part of a release pipeline, it may be managed in a pull request or it may be a manual process. Ultimately, by the time the actual changes are made to production infrastructure or applications, they should already be approved. This relies on the appropriate controls and restrictions being in place to make sure this happens. When it comes to the point of deploying resources into production Kubernetes clusters, they should have already been through a CR process. However, what if you wanted a way to validate that this is the case, and block anything from being deployed that does not have an approved CR, providing a backstop to ensure that no unapproved resources get deployed? Let's take a look at how we can use an Admission Controller to do this. Admission Controllers A Kubernetes Admission Controller is a mechanism to provide a checkpoint during a deployment that validates resources and applies rules and policies before this resource is accepted into the cluster. Any request to create, update or delete (CRUD) a resource is first run through any applicable admission controllers to check if it violates any of the required rules. Only if all admission controllers allow the request is it then processed. Kubernetes includes some built-in admission controllers, but you can also create your own. Admission controllers are essentially webhooks that are registered with the Kubernetes API server. When a CRUD request is processed by the API server, it calls any of these webhooks that are registered, and processes the response. When creating your own Admission controller, you would usually implement the webhook as a pod running in the cluster. There are three types of Admission Controller webhooks: MutatingAdmissionWebhook: Can modify the incoming object before it is persisted (e.g., injecting sidecars). ValidatingAdmissionWebhook: Can only approve or reject the request based on validation logic. ValidatingAdmissionPolicy: Validation logic is embedded in the API, rather than requiring a separate web service For our scenario we are going to look at using a ValidatingAdmissionWebhook, as we only want to approve or reject a request based on its change request status. Sample Code In this article, we are not going to go line by line through the code for this admission controller, however you can see an example implementation of this in this repo. In this example, we do not build out the full web service for validating change requests themselves. We have some pre-defined CR IDs with pre-configured statuses returned by the application. In a real world implementation your web service would call out to your change management solution to get the current status of the change request. This does not impact how you would build the Admission Controller, just the business logic inside your controller. Components Our Admission Controller consists of several components: Application Our actual admission controller application, which runs a HTTP service that receives the request from the API Server calling the webhook, processes it and applies business logic, and returns a response. In our example this service has been written in GO, but you can use whatever language you like. Your service must meet the API contract defined for the admission webhook. Our application does the following: Reads the incoming change body YAML and extracts the Change ID from the change.company.com/id annotation that should be applied to the resource. We also support the argocd.argoproj.io/change-id and deployment.company.com/change-id annotations. func extractChangeID(req *admissionv1.AdmissionRequest) string { // Try to extract change ID from object annotations obj := req.Object.Raw var objMap map[string]interface{} if err := json.Unmarshal(obj, &objMap); err != nil { return "" } if metadata, ok := objMap["metadata"].(map[string]interface{}); ok { if annotations, ok := metadata["annotations"].(map[string]interface{}); ok { // Look for change ID in various annotation formats if changeID, ok := annotations["change.company.com/id"].(string); ok { return changeID } if changeID, ok := annotations["argocd.argoproj.io/change-id"].(string); ok { return changeID } if changeID, ok := annotations["deployment.company.com/change-id"].(string); ok { return changeID } } } return "" } If it does not find the required annotation, it immediately fails the validation, as no CR is present. if changeID == "" { // Reject resources without change ID annotation klog.Infof("No change ID found, rejecting request") ac.respond(w, &admissionReview, false, "Change ID annotation is required") return } If the CR is present, it validates it. In our demo application this is checked against a hard-coded list of CRs, but in the real world, this is where you would make a call out to your external change management solution to get the CR with that ID. There are 3 possible outcomes here: The CR ID does not match an ID in our system, the validation fails The CR does match an ID in our system, but this CR is not approved, the validation fails The CR does match an ID in our system and this CR has been approved, the validation passes and the resources are created. changeRecord, err := ac.changeService.ValidateChange(changeID) if err != nil { klog.Errorf("Change validation failed: %v", err) ac.respond(w, &admissionReview, false, fmt.Sprintf("Change validation failed: %v", err)) return } if !changeRecord.Approved { klog.Infof("Change %s is not approved (status: %s)", changeID, changeRecord.Status) ac.respond(w, &admissionReview, false, fmt.Sprintf("Change %s is not approved (status: %s)", changeID, changeRecord.Status)) return } klog.Infof("Change %s is approved, allowing deployment", changeID) ac.respond(w, &admissionReview, true, fmt.Sprintf("Change %s approved by %s", changeID, changeRecord.Requester)) Container To run our Admission Controller inside the AKS cluster we need to create a Docker container that runs our application. In the sample code you will find a Docker file used to build this container. We then push the container to a Docker registry, so we can consume the image when we run the webhook service. Kubernetes Resources To run our Docker container and setup a URL that the API server can call we will deploy: A Kubernetes Deployment A Kubernetes Service A set of RBAC roles and bindings to grant access to the Admission Controller Finally, we will deploy the actual ValidatingAdmissionWebhook resource itself. This resource tells the API servers: Where to call the webhook Which operations should require calling the webhook - in our demo application we look at create and delete operations. If you wanted to validate delete operations had a CR, you could also add that Which resource types need to be validated - in our demo we are looking at Deployments, Services and Configmaps, but you could make this as wide or narrow as you require Which namespaces to validate - we added a condition that only applies this validation to namespaces that have a label of changeValidation set to enabled, this way we can control where this is applied and avoid applying it to things like system namespaces. This is very important to ensure you don't break your core Kubernetes infrastructure. This also allows for differentiation between development and production namespaces, where you likely would not want to require Change Requests in development. Finally, we define what happens when the validation fails. There are two options: fail which blocks the resource creation ignore which ignores the failure and allows the resource to be created apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionWebhook metadata: name: change-validation-webhook spec: clientConfig: service: name: admission-controller namespace: admission-controller path: "/admit" rules: - operations: ["CREATE", "UPDATE"] apiGroups: ["apps"] apiVersions: ["v1"] resources: ["deployments"] - operations: ["CREATE", "UPDATE"] apiGroups: [""] apiVersions: ["v1"] resources: ["services", "configmaps"] namespaceSelector: matchLabels: change-validation: "enabled" admissionReviewVersions: ["v1", "v1beta1"] sideEffects: None failurePolicy: Fail Admission Controller In Action Now that we have our admission controller setup, let's attempt to make a change to a resource. Using a Kubernetes Deployment resource, we will attempt to change the number of replicas from three to two. For this resource, the change.company.com/id annotation is set to CHG-2025-000 which is a change request that doesn't exist in our change management system. apiVersion: apps/v1 kind: Deployment metadata: name: demo-app namespace: demo annotations: change.company.com/id: "CHG-2025-000" labels: app: demo-app environment: development spec: replicas: 2 selector: matchLabels: app: demo-app Once we attempt to deploy this, we will quickly see that the the request to update the resource is denied: one or more objects failed to apply, reason: error when patching "/dev/shm/1236013741": admission webhook "change-validation.company.com" denied the request: Change validation failed: change record not found,admission webhook "change-validation.company.com" denied the request: Change validation failed: change record not found. Similarly, if we change the annotation to CHG-2025-999 which is a change request that does exist, but has not been approved, we again see that the request is denied, but this time the error is clear that it is not approved: one or more objects failed to apply, reason: error when patching "/dev/shm/28290353": admission webhook "change-validation.company.com" denied the request: Change CHG-2025-999 is not approved (status: pending),admission webhook "change-validation.company.com" denied the request: Change validation failed: change record not found. Finally, we update the annotation to CHG-2025-002, which has been approved. This time our deployment update succeeds and the number of replicas has been reduced to two. Next Steps What we have created so far works as a Proof of Concept to confirm that using an Admission Controller for this job will work. To move this into production use, we'd need to take a few more steps: Update our web API to call out to our external change management solution and retrieve real change requests Implement proper security for the Admission Controller with SSL certificates and network restrictions inside the cluster Implement high availability with multiple replicas to ensure the service is always able to respond to requests Implement monitoring and log collection for our service to ensure we are aware of any issues Automate the build and release of this solution, including implementing it's own set of change controls! Conclusions Controlling updates into production through a change control process is vital for a stable, secure and audited production environments. Ideally these CR processes will happen early in the release pipeline in a clear, automated process that avoids getting to the point where anyone tries to deploy unapproved changes into production. However, if you want to ensure that this cannot happen, and put some safeguards to ensure that unapproved changes are always blocked, then the use of Admission Controllers is one way to do this. Creating a custom Admission Controller is relatively straightforward and it allows you to integrate your business processes into the decision on whether a resource can be deployed or not. A change control Admission Controller should not be your only change control process, but it can form part of your layers of control and audit. Further Reading Sample Code Admission Control in Kubernetes Manage Change in the Cloud Adoption Framework305Views0likes0CommentsAzure Verified Modules: Support Statement & Target Response Times Update
We are announcing an update to the Azure Verified Modules (AVM) support statement. This change reflects our commitment to providing clarity alongside timely and effective support for our community and AVM module consumers. These changes are in preparation to allow us to enable AVM modules to be published as V1.X.X modules (future announcement on this soon 🥳 sign up to the next AVM Community Call on July 1st 2025 to learn more). What is the new support statement? You can find the support statement on the AVM website here: https://azure.github.io/Azure-Verified-Modules/help-support/module-support/#support-statements For bugs/security issues 5 business days for a triage, meaningful response, and ETA to be provided for fix/resolution by module owner (which could be past the 5 days) For issues that breach the 5 business days, the AVM core team will be notified and will attempt to respond to the issue within an additional 5 business days to assist in triage. For security issues, the Bicep or Terraform Product Groups may step in to resolve security issues, if unresolved, after a further additional 5 business days. For feature requests 15 business days for a meaningful response and initial triage to understand the feature request. An ETA may be provided by the module owner if possible. Key changes from the previous support statement In short its two items: Increasing response time targets from: 3 to 5 business days for issues And from 3 to 5 business days for AVM core team escalation Handling bugs/security issues separately from feature requests Feature requests now have a 15 business day target response time The previous support statement outlined a more rigid structure for issue triage and resolution. It required module owners/contributors to respond within 3 business days, with the AVM core team stepping in if there was no response within a further 24 hours. In the event of a security issue being unaddressed after 5 business days, escalation to the product group (Bicep/Terraform) would occur to assist the AVM core team. There was also no differentiation between bugs/security issues and feature requests, which there now is. You can view the git diff of the support statement here Why the changes? Being honest, we weren't meeting the previous support statement 100% of the time, which we are striving for, across all the AVM modules. And we heard from you that, that wasn't ideal and we agree whole heartedly. Therefore, we took a step back, reflected, looked at the data available and huddled together to redefine what the new AVM support statement and targets should be. "Yeah, but why can't you just meet the previous support statement and targets?" This is a very valid question that you may have or be wondering. And we want to be honest with you so here are the reasons why this isn't possible today: Module owners are not 100% dedicated to only supporting their AVM modules; they also have other daily roles and responsibilities in their jobs at Microsoft. Sometimes this also means conflicting priorities for module owners and they have to make a priority call. We underestimated the impact of holidays, annual leave, public holidays etc. The AVM core teams responsibility is not to resolve all module issues/requests as they are smaller team driving the AVM framework, specs, tooling and tests. They will of course step in when needed, as they have done so far today 👍 We don't get as many contributions from the open-source community as we expected and would still love to see 😉 For clarity we always love to see a Pull Request to help us add new features or resolve bugs and issues, even for simple things like typos. It really does help us go faster 🏃➡️ "How are you going to try and avoid changing (increasing) the support statement and targets in the future?" Again another very valid ask! And we reflected upon this when making these changes to the support statement we are announcing here. To avoid this potential risk we are also taking the following actions today: Building new internal tooling and dashboards for module owners to discover, track and monitor their issues and pull requests across various modules they may own, across multiple languages. (already complete and published 👍) This tooling will also help the AVM core team track issues and report on them more easily to help module owners avoid non-compliance with the targets. Continue to push for, promote, and encourage open-source community contributions Prevent AVM modules being published as V1.X.X if they are unable to prove compliance with the new support statement and targets (sneak peek into V1.X.X requirements) Looking further into the future we are also investigating the following: Building a dedicated AVM team, separate from the AVM core team, that will triage, work on, and fix/resolve issues that are nearing or breaching the support statement and targets. Also they will look into feature requests as and where time allows or are popular/upvoted heavily where module owners are unable to prioritize in the near future due to other priorities. Seeing where AI and other automation tooling can assist with issue triage and resolution to reduce module owner workload. Summary We hope that this provides you with a clear understanding of the changes to the AVM support statement and targets and why we are making these. We also hope you appreciate our honesty on the situation and can see we are taking action to make things better while also reflecting and amending our support statements to be more realistic based on the past 2 years of launching and running AVM to date. Finally we just want to reassure everyone that we remain committed to AVM and have big plans for the rest of the calendar year and beyond! 😎 And with this in mind we want to remind you to sign up to the next AVM Community Call on July 1st 2025 to learn more and ask any questions on this topic or anything else AVM related with the rest of the community 👍 Thanks The AVM Core Team1.5KViews4likes0Comments