best practices
1554 TopicsQuest 9: I want to use a ready-made template
Building robust, scalable AI apps is tough, especially when you want to move fast, follow best practices, and avoid being bogged down by endless setup and configuration. In this quest, you’ll discover how to accelerate your journey from prototype to production by leveraging ready-made templates and modern cloud tools. Say goodbye to decision fatigue and hello to streamlined, industry-approved workflows you can make your own. 👉 Want to catch up on the full program or grab more quests? https://aka.ms/JSAIBuildathon 💬 Got questions or want to hang with other builders? Join us on Discord — head to the #js-ai-build-a-thon channel. 🚀 What You’ll Build A fully functional AI application deployed on Azure, customized to solve a real problem that matters to you. A codebase powered by a production-grade template, complete with all the necessary infrastructure-as-code, deployment scripts, and best practices already baked in. Your own proof-of-concept or MVP, ready to scale or show off to the world. 🛠️ What You Need ✅ GitHub account ✅ Visual Studio Code ✅ Node.js ✅ Azure subscription (free trials and student credits available) ✅ Azure Developer CLI (azd) ✅ The curiosity to solve a meaningful problem! 🧩 Concepts You’ll Explore Azure Developer CLI (azd) Learn how azd, the developer-first command-line tool, simplifies authentication, setup, deployment, and teardown for Azure apps. With intuitive commands like azd up and azd deploy, you can go from zero to running in the cloud no deep cloud expertise required. Production-Ready Templates Explore a gallery of customizable templates designed to get your app up and running fast. These templates aren’t just “hello world” they feature scalable architectures, sample code, and reusable infrastructure assets to launch everything from chatbots to RAG apps to full-stack solutions. Infrastructure as Code (IaC) See how every template bundle configuration files and scripts to automatically provision the cloud resources you need. You’ll get a taste of how top teams ship secure, repeatable, and maintainable systems without manually clicking through Azure dashboards. Best Practices by Default Templates incorporate industry best practices for code structure, deployment, and scalability. You’ll spend less time researching how to “do it right” and more time customizing your application to fit your unique use case. Customization for Real-World Problems Pick a template and make it yours! Whether you’re building a copilot, a chat-enabled app, or a serverless API, you’ll learn how to tweak the frontend, swap out backend logic, connect your own data sources, and shape the solution to solve a real-world problem you care about. 🌟 Bonus Resources Here are some additional resources to help you learn more about the Azure Developer CLI (azd) and the templates available: Kickstart JS/TS projects with azd Templates Kickstart your JavaScript projects with azd on YouTube ⏭️ What next? With production-ready templates and the Azure Developer CLI at your side, you’re ready to move from “just an idea” to a deployable, scalable solution without reinventing the wheel. Start with the right foundation, customize with confidence, and ship your next AI app like a pro! Once you have your project done, ensure you submit to GitHub - Azure-Samples/JS-AI-Build-a-thonSuperfast using Web App and Managed Identity to invoke Function App triggers
TOC Introduction Setup References 1. Introduction Many enterprises prefer not to use App Keys to invoke Function App triggers, as they are concerned that these fixed strings might be exposed. This method allows you to invoke Function App triggers using Managed Identity for enhanced security. I will provide examples in both Bash and Node.js. 2. Setup 1. Create a Linux Python 3.11 Function App 1.1. Configure Authentication to block unauthenticated callers while allowing the Web App’s Managed Identity to authenticate. Identity Provider Microsoft Choose a tenant for your application and it's users Workforce Configuration App registration type Create Name [automatically generated] Client Secret expiration [fit-in your business purpose] Supported Account Type Any Microsoft Entra Directory - Multi-Tenant Client application requirement Allow requests from any application Identity requirement Allow requests from any identity Tenant requirement Use default restrictions based on issuer Token store [checked] 1.2. Create an anonymous trigger. Since your app is already protected by App Registration, additional Function App-level protection is unnecessary; otherwise, you will need a Function Key to trigger it. 1.3. Once the Function App is configured, try accessing the endpoint directly—you should receive a 401 Unauthorized error, confirming that triggers cannot be accessed without proper Managed Identity authorization. 1.4. After making these changes, wait 10 minutes for the settings to take effect. 2. Create a Linux Node.js 20 Web App and Obtain an Access Token and Invoke the Function App Trigger Using Web App (Bash Example) 2.1. Enable System Assigned Managed Identity in the Web App settings. 2.2. Open Kudu SSH Console for the Web App. 2.3. Run the following commands, making the necessary modifications: subscriptionsID → Replace with your Subscription ID. resourceGroupsID → Replace with your Resource Group ID. application_id_uri → Replace with the Application ID URI from your Function App’s App Registration. https://az-9640-faapp.azurewebsites.net/api/test_trigger → Replace with the corresponding Function App trigger URL. # Please setup the target resource to yours subscriptionsID="01d39075-XXXX-XXXX-XXXX-XXXXXXXXXXXX" resourceGroupsID="XXXX" # Variable Setting (No need to change) identityEndpoint="$IDENTITY_ENDPOINT" identityHeader="$IDENTITY_HEADER" application_id_uri="api://9c0012ad-XXXX-XXXX-XXXX-XXXXXXXXXXXX" # Install necessary tool apt install -y jq # Get Access Token tokenUri="${identityEndpoint}?resource=${application_id_uri}&api-version=2019-08-01" accessToken=$(curl -s -H "Metadata: true" -H "X-IDENTITY-HEADER: $identityHeader" "$tokenUri" | jq -r '.access_token') echo "Access Token: $accessToken" # Run Trigger response=$(curl -s -o response.json -w "%{http_code}" -X GET "https://az-9640-myfa.azurewebsites.net/api/my_test_trigger" -H "Authorization: Bearer $accessToken") echo "HTTP Status Code: $response" echo "Response Body:" cat response.json 2.4. If everything is set up correctly, you should see a successful invocation result. 3. Invoke the Function App Trigger Using Web App (nodejs Example) I have also provide my example, which you can modify accordingly and save it to /home/site/wwwroot/callFunctionApp.js and run it cd /home/site/wwwroot/ vi callFunctionApp.js npm init -y npm install azure/identity axios node callFunctionApp.js // callFunctionApp.js const { DefaultAzureCredential } = require("@azure/identity"); const axios = require("axios"); async function callFunctionApp() { try { const applicationIdUri = "api://9c0012ad-XXXX-XXXX-XXXX-XXXXXXXXXXXX"; // Change here const credential = new DefaultAzureCredential(); console.log("Requesting token..."); const tokenResponse = await credential.getToken(applicationIdUri); if (!tokenResponse || !tokenResponse.token) { throw new Error("Failed to acquire access token"); } const accessToken = tokenResponse.token; console.log("Token acquired:", accessToken); const apiUrl = "https://az-9640-myfa.azurewebsites.net/api/my_test_trigger"; // Change here console.log("Calling the API now..."); const response = await axios.get(apiUrl, { headers: { Authorization: `Bearer ${accessToken}`, }, }); console.log("HTTP Status Code:", response.status); console.log("Response Body:", response.data); } catch (error) { console.error("Failed to call the function", error.response ? error.response.data : error.message); } } callFunctionApp(); Below is my execution result: 3. References Tutorial: Managed Identity to Invoke Azure Functions | Microsoft Learn How to Invoke Azure Function App with Managed Identity | by Krizzia 🤖 | Medium Configure Microsoft Entra authentication - Azure App Service | Microsoft Learn732Views1like2CommentsAnnouncing the General Availability of New Availability Zone Features for Azure App Service
What are Availability Zones? Availability Zones, or zone redundancy, refers to the deployment of applications across multiple availability zones within an Azure region. Each availability zone consists of one or more data centers with independent power, cooling, and networking. By leveraging zone redundancy, you can protect your applications and data from data center failures, ensuring uninterrupted service. Key Updates The minimum instance requirement for enabling Availability Zones has been reduced from three instances to two, while still maintaining a 99.99% SLA. Many existing App Service plans with two or more instances will automatically support Availability Zones without additional setup. The zone redundant setting for App Service plans and App Service Environment v3 is now mutable throughout the life of the resources. Enhanced visibility into Availability Zone information, including physical zone placement and zone counts, is now provided. For App Service Environment v3, the minimum instance fee for enabling Availability Zones has been removed, aligning the pricing model with the multi-tenant App Service offering. The minimum instance requirement for enabling Availability Zones has been reduced from three instances to two. You can now enjoy the benefits of Availability Zones with just two instances since we continue to uphold a 99.99% SLA even with the two-instance configuration. Many existing App Service plans with two or more instances will automatically support Availability Zones without necessitating additional setup. Over the past few years, efforts have been made to ensure that the App Service footprint supports Availability Zones wherever possible, and we’ve made significant gains in doing so. Therefore, many existing customers can enable Availability Zones on their current deployments without needing to redeploy. Along with supporting 2-instance Availability Zone configuration, we have enabled Availability Zones on the App Service footprint in regions where only two zones may be available. Previously, enabling Availability Zones required a region to have three zones with sufficient capacity. To account for the growing demand, we now support Availability Zone deployments in regions with just two zones. This allows us to provide you with Availability Zone features across more regions. And with that, we are upholding the 99.99% SLA even with the 2-zone configuration. Additionally, we are pleased to announce that the zone redundant setting (zoneRedundant property) for App Service plans and App Service Environment v3 is now mutable throughout the life of these resources. This enhancement allows customers on Premium V2, Premium V3, or Isolated V2 plans to toggle zone redundancy on or off as required. With this capability, you can reduce costs and scale to a single instance when multiple instances are not necessary. Conversely, you can scale out and enable zone redundancy at any time to meet your requirements. This ability has been requested for a while now and we are excited to finally make it available. For App Service Environment v3 users, this also means that your individual App Service plan zone redundancy status is now independent of other plans in your App Service Environment. This means that you can have a mix of zone redundant and non-zone redundant plans in an App Service Environment, something that was previously not supported. In addition to these new features, we also have a couple of other exciting things to share. We are now providing enhanced visibility into Availability Zone information, including the physical zone placement of your instances and zone counts. For our App Service Environment v3 customers, we have removed the minimum instance fee for enabling Availability Zones. This means that you now only pay for the Isolated V2 instances you consume. This aligns the pricing model with the multi-tenant App Service offering. For more information as well as guidance on how to use these features, see the docs - Reliability in Azure App Service. Azure Portal support for these new features will be available by mid-June 2025. In the meantime, see the documentation to use these new features with ARM/Bicep or the Azure CLI. Also check out BRK200 breakout session at Microsoft Build 2025 live on May 20th or anytime after via the recording where my team and I will be discussing these new features and many more exciting announcements for Azure App Service. If you’re in the Seattle area and attending Microsoft Build 2025 in person, come meet my team and me at our Expert Meetup Booth. FAQ Q: What are availability zones? Availability zones are physically separate locations within an Azure region, each consisting of one or more data centers with independent power, cooling, and networking. Deploying applications across multiple availability zones ensures high availability and business continuity. Q: How do I enable Availability Zones for my existing App Service plan or App Service Environment v3? There is a new toggle in the Azure portal that will be enabled if your App Service plan or App Service Environment v3 supports Availability Zones. Your deployment must be on the App Service footprint that supports zones in order to have this capability. There is a new property called “MaximumNumberOfZones”, which indicates the number of zones your deployment supports. If this value is greater than one, you are on the footprint that supports zones and can enable Availability Zones as long as you have two or more instances. If this value is equal to one, you need to redeploy. Note that we are continually working to expand the zone footprint across more App Service deployments. Q: Is there an additional charge for Availability Zones? There is no additional charge, you only pay for the instances you use. The only requirement is that you use two or more instances. Q: Can I change the zone redundant property after creating my App Service plan? Yes, the zone redundant property is now mutable, meaning you can toggle it on or off at any time. Q: How can I verify the zone redundancy status of my App Service Plans? We now display the physical zone for each instance, helping you verify zone redundancy status for audits and compliance reviews. Q: How do I use these new features? You can use ARM/Bicep or the Azure CLI at this time. Starting in mid-June, Azure Portal support should be available. The documentation currently shows how to use ARM/Bicep and the Azure CLI to enable these features. The documentation as well as this blog post will be updated once Azure Portal support is available. Q: Are Availability Zones supported on Premium V4? Yes! See the documentation for more details on how to get started with Premium V4 today.3.3KViews8likes2CommentsOptimizing Resource Allocation with Microsoft Defender CSPM
This article is part of our series on “Strategy to Execution: Operationalizing Microsoft Defender CSPM.” If you’re new to the series, or want broader strategic context, begin with our main overview article, then explore Article 1, Article 2, and Article 3 for details on risk identification, compliance, and DevSecOps workflows. Introduction Organizations today face an array of challenges in their cloud security efforts, ever-growing multicloud infrastructures, finite budgets, and evolving threat landscapes. Effectively allocating limited resources is critical: security teams must prioritize the vulnerabilities posing the highest risk while avoiding spending precious time and money on lower-priority issues. Defender CSPM (Cloud Security Posture Management) provides a data-driven approach to this problem. By continuously analyzing the security posture across Azure, AWS, and GCP, Defender CSPM calculates risk scores based on factors such as business impact, exposure, and potential exploitability. Armed with these insights, security teams can make informed decisions about where to focus resources, maximizing impact and reducing their overall risk. In this fourth, and last article of our series, we’ll examine how to operationalize resource allocation with Defender CSPM. We’ll discuss the common allocation challenges, explain how CSPM’s risk-based prioritization helps address them, and provide practical steps to implement an effective allocation strategy. Why Resource Allocation Matters in Multicloud Security Resource allocation is critical in multicloud security because securing environments that span multiple cloud providers introduces unique challenges that require careful planning. Before you can decide where to invest your time, budget, and headcount, you need to understand the hurdles that make multicloud allocation especially tough: Overwhelming Volume of Vulnerabilities Modern cloud environments are common with potential vulnerabilities. Multicloud setups compound this challenge by introducing platform-specific risks. Without a clear prioritization method, teams risk tackling too many issues at once, often leaving truly critical threats under-addressed. Competing Priorities Across Teams Security, DevOps, and IT teams frequently have diverging goals. Security may emphasize high-risk vulnerabilities, while DevOps focuses on uptime and rapid releases. Aligning everyone on which vulnerabilities matter most ensures strategic clarity and reduces internal friction. Limited Budgets and Skilled Personnel Constrained cybersecurity budgets and headcount force tough decisions about which fixes or upgrades to fund. By focusing on vulnerabilities that present the highest risk to the business, organizations can make the most of available resources. Lack of Centralized Visibility Monitoring and correlating vulnerabilities across multiple cloud providers can be time-intensive and fragmented. Without a unified view, it’s easy to miss critical issues or duplicate remediation efforts, both of which squander limited resources. How Defender CSPM Enables Risk-Based Resource Allocation To address the complex task of resource allocation in sprawling, multicloud estates, security teams need more than raw vulnerability data, they need a system that continually filters, enriches, and ranks findings by real-world impact. Microsoft Defender CSPM equips security teams with automated, prioritized insights and unified visibility. It brings together telemetry from Azure, AWS, and GCP, applies advanced analytics to assess which weaknesses pose the greatest danger, and then packages those insights into clear, actionable priorities. The following capabilities form the backbone of a risk-based allocation strategy: Risk Scoring and Prioritization Defender CSPM continuously evaluates vulnerabilities and security weaknesses, assigning each one a risk score informed by: Business Impact – How vital a resource or application is to daily operations. Exposure – Whether a resource is publicly accessible or holds sensitive data. Exploitability – Contextual factors (configuration, known exploits, network paths) that heighten or lower a vulnerability’s real-world risk. This approach ensures that resources, time, budget, and staff are channeled toward the issues that most endanger the organization. Centralized Visibility Across Clouds Multicloud support means you can view vulnerabilities across Azure, AWS, and GCP in a single pane of glass. This unified perspective helps teams avoid duplicative efforts and ensures each high-risk finding is appropriately addressed, no matter the platform. Automated, Context-Aware Insights Manual vulnerability evaluations are time-consuming and prone to oversight. Defender CSPM automates the risk-scoring process, updating risk levels as new vulnerabilities arise or resources change, so teams can act promptly on the most critical gaps. Tailored Remediation Guidance In addition to highlighting high-risk issues, Defender CSPM provides recommended steps to fix them, such as applying patches, adjusting access controls, or reconfiguring cloud resources. Having guided instructions accelerates remediation efforts and reduces the potential for human error. Step-by-Step: Operationalizing Resource Allocation with Defender CSPM Below is a practical workflow integrating both the strategic and operational aspects of allocating resources effectively. Step 1: Build a Risk Assessment Framework Identify Business-Critical Assets Collaborate with business leaders, application owners, and architects to label high-priority workloads (e.g., production apps, data stores with customer information). Use resource tagging (Azure tags, AWS tags, GCP labels) to systematically mark essential resources. Align Defender CSPM’s Risk Scoring with Business Impact Customize Defender CSPM’s scoring model to reflect your organization’s unique risk tolerance. Set up periodic risk-scoring workshops with security, compliance, and business stakeholders to keep definitions current. Categorize Vulnerabilities Group vulnerabilities into critical, high, medium, or low, based on the assigned risk score. Establish remediation SLAs for each severity level (e.g., 24-48 hours for critical; 7-14 days for medium). Step 2: Allocate Budgets and Personnel Based on Risk Prioritize Funding for High-Risk Issues Work with finance or procurement to ensure the biggest threats receive adequate budget. This may cover additional tooling, specialized consulting, or staff training. If a public-facing resource with sensitive data is flagged, you might immediately allocate budget for patching or additional third-party security review. Track Resource Utilization Monitor how much time and money go into specific vulnerabilities. Overinvesting in less severe issues can starve critical areas of necessary attention. Use dashboards in Power BI or similar tools to visualize resource allocation versus risk impact. Define Clear SLAs Set more aggressive SLAs for higher-risk items. For instance, fix critical vulnerabilities within 24-48 hours to minimize dwell time. Align your ticketing system (e.g., ServiceNow, Jira) with Defender CSPM so each newly discovered high-risk vulnerability automatically flags an urgent ticket. Step 3: Continuously Track Metrics and Improve Mean Time to Remediate (MTTR) Monitor how long it takes to fix vulnerabilities after they’re identified. Strive for a shorter MTTR on top-priority issues. Reduction in Risk Exposure Track how many high-priority vulnerabilities are resolved over time. A downward trend indicates effective remediation. Re-assess risk after major remediation efforts; scores should reflect newly reduced exposure. Resource Utilization Efficiency Compare security spending or labor hours to actual risk reduction outcomes. If you’re using valuable resources on low-impact tasks, reallocate them. Evaluate whether your investments, tools, staff, or specialized training, are paying off in measurable risk reduction. Compliance Improvement For organizations under regulations like HIPAA or PCI-DSS, measure compliance posture. Defender CSPM can highlight policy violations and track improvement over time. Benchmark Against Industry Standards Compare your results (MTTR, risk exposure, compliance posture) against sector-specific benchmarks. Adjust resource allocation strategies if you’re lagging behind peers. Strategic Benefits of a Risk-Based Approach Maximized ROI By focusing on truly critical issues, you’ll see faster, more tangible reductions in risk for each security dollar spent. Faster Remediation of High-Risk Vulnerabilities With Defender CSPM’s clear rankings, teams know which issues to fix first, minimizing exposure windows for the worst threats. Improved Collaboration Providing a transparent, data-driven explanation for why certain vulnerabilities get priority eases friction between security, DevOps, and operations teams. Scalable for Growth As you add cloud workloads, CSPM’s automated scoring scales with you. You’ll always have an updated queue of the most urgent vulnerabilities to tackle. Stronger Risk Management Posture Continuously focusing on top risks aligns security investments with business goals and helps maintain compliance with evolving standards and regulations. Conclusion Resource allocation is a central concern for any organization striving to maintain robust cloud security. Microsoft Defender for Cloud’s CSPM makes these decisions more straightforward by automatically scoring vulnerabilities according to impact, exposure, and other contextual factors. Security teams can thus prioritize their limited budgets, personnel, and time for maximum effect, reducing the window of exposure and minimizing the likelihood of critical breaches. By following the steps outlined here, building a risk assessment framework, allocating resources proportionally to risk severity, and monitoring metrics to drive continuous improvement, you can ensure your security program remains agile and cost-effective. In doing so, you’ll align cybersecurity investments with broader business objectives, ultimately delivering measurable risk reduction in today’s dynamic, multicloud environment. Microsoft Defender for Cloud - Additional Resources Strategy to Execution: Operationalizing Microsoft Defender CSPM Considerations for risk identification and prioritization in Defender for Cloud Strengthening Cloud Compliance and Governance with Microsoft Defender CSPM Integrating Security into DevOps Workflows with Microsoft Defender CSPM Download the new Microsoft CNAPP eBook at aka.ms/MSCNAPP Become a Defender for Cloud Ninja by taking the assessment at aka.ms/MDCNinja Reviewers Yuri Diogenes, Principal PM Manager, CxE Defender for CloudWhy can't I see the Microsoft Teams Meeting add-in for Outlook?
We’ve heard reports that the Microsoft Teams Meeting Add-in for Outlook on Windows does not show up for some users. There are several reasons why the add-in may not display that have simple actions to address. Here are some steps to help you troubleshoot this problem.2.2MViews34likes158CommentsValidating Change Requests with Kubernetes Admission Controllers
Promoting an application or infrastructure change into production often comes with a requirement to follow a change control process. This ensures that changes to production are properly reviewed and that they adhere to required approvals, change windows and QA process. Often this change request (CR) process will be conducted using a system for recording and auditing the change request and the outcome. When deploying a release, there will often be places in the process to go through this change control workflow. This may be as part of a release pipeline, it may be managed in a pull request or it may be a manual process. Ultimately, by the time the actual changes are made to production infrastructure or applications, they should already be approved. This relies on the appropriate controls and restrictions being in place to make sure this happens. When it comes to the point of deploying resources into production Kubernetes clusters, they should have already been through a CR process. However, what if you wanted a way to validate that this is the case, and block anything from being deployed that does not have an approved CR, providing a backstop to ensure that no unapproved resources get deployed? Let's take a look at how we can use an Admission Controller to do this. Admission Controllers A Kubernetes Admission Controller is a mechanism to provide a checkpoint during a deployment that validates resources and applies rules and policies before this resource is accepted into the cluster. Any request to create, update or delete (CRUD) a resource is first run through any applicable admission controllers to check if it violates any of the required rules. Only if all admission controllers allow the request is it then processed. Kubernetes includes some built-in admission controllers, but you can also create your own. Admission controllers are essentially webhooks that are registered with the Kubernetes API server. When a CRUD request is processed by the API server, it calls any of these webhooks that are registered, and processes the response. When creating your own Admission controller, you would usually implement the webhook as a pod running in the cluster. There are three types of Admission Controller webhooks: MutatingAdmissionWebhook: Can modify the incoming object before it is persisted (e.g., injecting sidecars). ValidatingAdmissionWebhook: Can only approve or reject the request based on validation logic. ValidatingAdmissionPolicy: Validation logic is embedded in the API, rather than requiring a separate web service For our scenario we are going to look at using a ValidatingAdmissionWebhook, as we only want to approve or reject a request based on its change request status. Sample Code In this article, we are not going to go line by line through the code for this admission controller, however you can see an example implementation of this in this repo. In this example, we do not build out the full web service for validating change requests themselves. We have some pre-defined CR IDs with pre-configured statuses returned by the application. In a real world implementation your web service would call out to your change management solution to get the current status of the change request. This does not impact how you would build the Admission Controller, just the business logic inside your controller. Components Our Admission Controller consists of several components: Application Our actual admission controller application, which runs a HTTP service that receives the request from the API Server calling the webhook, processes it and applies business logic, and returns a response. In our example this service has been written in GO, but you can use whatever language you like. Your service must meet the API contract defined for the admission webhook. Our application does the following: Reads the incoming change body YAML and extracts the Change ID from the change.company.com/id annotation that should be applied to the resource. We also support the argocd.argoproj.io/change-id and deployment.company.com/change-id annotations. func extractChangeID(req *admissionv1.AdmissionRequest) string { // Try to extract change ID from object annotations obj := req.Object.Raw var objMap map[string]interface{} if err := json.Unmarshal(obj, &objMap); err != nil { return "" } if metadata, ok := objMap["metadata"].(map[string]interface{}); ok { if annotations, ok := metadata["annotations"].(map[string]interface{}); ok { // Look for change ID in various annotation formats if changeID, ok := annotations["change.company.com/id"].(string); ok { return changeID } if changeID, ok := annotations["argocd.argoproj.io/change-id"].(string); ok { return changeID } if changeID, ok := annotations["deployment.company.com/change-id"].(string); ok { return changeID } } } return "" } If it does not find the required annotation, it immediately fails the validation, as no CR is present. if changeID == "" { // Reject resources without change ID annotation klog.Infof("No change ID found, rejecting request") ac.respond(w, &admissionReview, false, "Change ID annotation is required") return } If the CR is present, it validates it. In our demo application this is checked against a hard-coded list of CRs, but in the real world, this is where you would make a call out to your external change management solution to get the CR with that ID. There are 3 possible outcomes here: The CR ID does not match an ID in our system, the validation fails The CR does match an ID in our system, but this CR is not approved, the validation fails The CR does match an ID in our system and this CR has been approved, the validation passes and the resources are created. changeRecord, err := ac.changeService.ValidateChange(changeID) if err != nil { klog.Errorf("Change validation failed: %v", err) ac.respond(w, &admissionReview, false, fmt.Sprintf("Change validation failed: %v", err)) return } if !changeRecord.Approved { klog.Infof("Change %s is not approved (status: %s)", changeID, changeRecord.Status) ac.respond(w, &admissionReview, false, fmt.Sprintf("Change %s is not approved (status: %s)", changeID, changeRecord.Status)) return } klog.Infof("Change %s is approved, allowing deployment", changeID) ac.respond(w, &admissionReview, true, fmt.Sprintf("Change %s approved by %s", changeID, changeRecord.Requester)) Container To run our Admission Controller inside the AKS cluster we need to create a Docker container that runs our application. In the sample code you will find a Docker file used to build this container. We then push the container to a Docker registry, so we can consume the image when we run the webhook service. Kubernetes Resources To run our Docker container and setup a URL that the API server can call we will deploy: A Kubernetes Deployment A Kubernetes Service A set of RBAC roles and bindings to grant access to the Admission Controller Finally, we will deploy the actual ValidatingAdmissionWebhook resource itself. This resource tells the API servers: Where to call the webhook Which operations should require calling the webhook - in our demo application we look at create and delete operations. If you wanted to validate delete operations had a CR, you could also add that Which resource types need to be validated - in our demo we are looking at Deployments, Services and Configmaps, but you could make this as wide or narrow as you require Which namespaces to validate - we added a condition that only applies this validation to namespaces that have a label of changeValidation set to enabled, this way we can control where this is applied and avoid applying it to things like system namespaces. This is very important to ensure you don't break your core Kubernetes infrastructure. This also allows for differentiation between development and production namespaces, where you likely would not want to require Change Requests in development. Finally, we define what happens when the validation fails. There are two options: fail which blocks the resource creation ignore which ignores the failure and allows the resource to be created apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionWebhook metadata: name: change-validation-webhook spec: clientConfig: service: name: admission-controller namespace: admission-controller path: "/admit" rules: - operations: ["CREATE", "UPDATE"] apiGroups: ["apps"] apiVersions: ["v1"] resources: ["deployments"] - operations: ["CREATE", "UPDATE"] apiGroups: [""] apiVersions: ["v1"] resources: ["services", "configmaps"] namespaceSelector: matchLabels: change-validation: "enabled" admissionReviewVersions: ["v1", "v1beta1"] sideEffects: None failurePolicy: Fail Admission Controller In Action Now that we have our admission controller setup, let's attempt to make a change to a resource. Using a Kubernetes Deployment resource, we will attempt to change the number of replicas from three to two. For this resource, the change.company.com/id annotation is set to CHG-2025-000 which is a change request that doesn't exist in our change management system. apiVersion: apps/v1 kind: Deployment metadata: name: demo-app namespace: demo annotations: change.company.com/id: "CHG-2025-000" labels: app: demo-app environment: development spec: replicas: 2 selector: matchLabels: app: demo-app Once we attempt to deploy this, we will quickly see that the the request to update the resource is denied: one or more objects failed to apply, reason: error when patching "/dev/shm/1236013741": admission webhook "change-validation.company.com" denied the request: Change validation failed: change record not found,admission webhook "change-validation.company.com" denied the request: Change validation failed: change record not found. Similarly, if we change the annotation to CHG-2025-999 which is a change request that does exist, but has not been approved, we again see that the request is denied, but this time the error is clear that it is not approved: one or more objects failed to apply, reason: error when patching "/dev/shm/28290353": admission webhook "change-validation.company.com" denied the request: Change CHG-2025-999 is not approved (status: pending),admission webhook "change-validation.company.com" denied the request: Change validation failed: change record not found. Finally, we update the annotation to CHG-2025-002, which has been approved. This time our deployment update succeeds and the number of replicas has been reduced to two. Next Steps What we have created so far works as a Proof of Concept to confirm that using an Admission Controller for this job will work. To move this into production use, we'd need to take a few more steps: Update our web API to call out to our external change management solution and retrieve real change requests Implement proper security for the Admission Controller with SSL certificates and network restrictions inside the cluster Implement high availability with multiple replicas to ensure the service is always able to respond to requests Implement monitoring and log collection for our service to ensure we are aware of any issues Automate the build and release of this solution, including implementing it's own set of change controls! Conclusions Controlling updates into production through a change control process is vital for a stable, secure and audited production environments. Ideally these CR processes will happen early in the release pipeline in a clear, automated process that avoids getting to the point where anyone tries to deploy unapproved changes into production. However, if you want to ensure that this cannot happen, and put some safeguards to ensure that unapproved changes are always blocked, then the use of Admission Controllers is one way to do this. Creating a custom Admission Controller is relatively straightforward and it allows you to integrate your business processes into the decision on whether a resource can be deployed or not. A change control Admission Controller should not be your only change control process, but it can form part of your layers of control and audit. Further Reading Sample Code Admission Control in Kubernetes Manage Change in the Cloud Adoption Framework221Views0likes0Comments