best practices
79 TopicsSuperfast using Web App and Managed Identity to invoke Function App triggers
TOC Introduction Setup References 1. Introduction Many enterprises prefer not to use App Keys to invoke Function App triggers, as they are concerned that these fixed strings might be exposed. This method allows you to invoke Function App triggers using Managed Identity for enhanced security. I will provide examples in both Bash and Node.js. 2. Setup 1. Create a Linux Python 3.11 Function App 1.1. Configure Authentication to block unauthenticated callers while allowing the Web App’s Managed Identity to authenticate. Identity Provider Microsoft Choose a tenant for your application and it's users Workforce Configuration App registration type Create Name [automatically generated] Client Secret expiration [fit-in your business purpose] Supported Account Type Any Microsoft Entra Directory - Multi-Tenant Client application requirement Allow requests from any application Identity requirement Allow requests from any identity Tenant requirement Use default restrictions based on issuer Token store [checked] 1.2. Create an anonymous trigger. Since your app is already protected by App Registration, additional Function App-level protection is unnecessary; otherwise, you will need a Function Key to trigger it. 1.3. Once the Function App is configured, try accessing the endpoint directly—you should receive a 401 Unauthorized error, confirming that triggers cannot be accessed without proper Managed Identity authorization. 1.4. After making these changes, wait 10 minutes for the settings to take effect. 2. Create a Linux Node.js 20 Web App and Obtain an Access Token and Invoke the Function App Trigger Using Web App (Bash Example) 2.1. Enable System Assigned Managed Identity in the Web App settings. 2.2. Open Kudu SSH Console for the Web App. 2.3. Run the following commands, making the necessary modifications: subscriptionsID → Replace with your Subscription ID. resourceGroupsID → Replace with your Resource Group ID. application_id_uri → Replace with the Application ID URI from your Function App’s App Registration. https://az-9640-faapp.azurewebsites.net/api/test_trigger → Replace with the corresponding Function App trigger URL. # Please setup the target resource to yours subscriptionsID="01d39075-XXXX-XXXX-XXXX-XXXXXXXXXXXX" resourceGroupsID="XXXX" # Variable Setting (No need to change) identityEndpoint="$IDENTITY_ENDPOINT" identityHeader="$IDENTITY_HEADER" application_id_uri="api://9c0012ad-XXXX-XXXX-XXXX-XXXXXXXXXXXX" # Install necessary tool apt install -y jq # Get Access Token tokenUri="${identityEndpoint}?resource=${application_id_uri}&api-version=2019-08-01" accessToken=$(curl -s -H "Metadata: true" -H "X-IDENTITY-HEADER: $identityHeader" "$tokenUri" | jq -r '.access_token') echo "Access Token: $accessToken" # Run Trigger response=$(curl -s -o response.json -w "%{http_code}" -X GET "https://az-9640-myfa.azurewebsites.net/api/my_test_trigger" -H "Authorization: Bearer $accessToken") echo "HTTP Status Code: $response" echo "Response Body:" cat response.json 2.4. If everything is set up correctly, you should see a successful invocation result. 3. Invoke the Function App Trigger Using Web App (nodejs Example) I have also provide my example, which you can modify accordingly and save it to /home/site/wwwroot/callFunctionApp.js and run it cd /home/site/wwwroot/ vi callFunctionApp.js npm init -y npm install azure/identity axios node callFunctionApp.js // callFunctionApp.js const { DefaultAzureCredential } = require("@azure/identity"); const axios = require("axios"); async function callFunctionApp() { try { const applicationIdUri = "api://9c0012ad-XXXX-XXXX-XXXX-XXXXXXXXXXXX"; // Change here const credential = new DefaultAzureCredential(); console.log("Requesting token..."); const tokenResponse = await credential.getToken(applicationIdUri); if (!tokenResponse || !tokenResponse.token) { throw new Error("Failed to acquire access token"); } const accessToken = tokenResponse.token; console.log("Token acquired:", accessToken); const apiUrl = "https://az-9640-myfa.azurewebsites.net/api/my_test_trigger"; // Change here console.log("Calling the API now..."); const response = await axios.get(apiUrl, { headers: { Authorization: `Bearer ${accessToken}`, }, }); console.log("HTTP Status Code:", response.status); console.log("Response Body:", response.data); } catch (error) { console.error("Failed to call the function", error.response ? error.response.data : error.message); } } callFunctionApp(); Below is my execution result: 3. References Tutorial: Managed Identity to Invoke Azure Functions | Microsoft Learn How to Invoke Azure Function App with Managed Identity | by Krizzia 🤖 | Medium Configure Microsoft Entra authentication - Azure App Service | Microsoft Learn728Views1like2CommentsAnnouncing the General Availability of New Availability Zone Features for Azure App Service
What are Availability Zones? Availability Zones, or zone redundancy, refers to the deployment of applications across multiple availability zones within an Azure region. Each availability zone consists of one or more data centers with independent power, cooling, and networking. By leveraging zone redundancy, you can protect your applications and data from data center failures, ensuring uninterrupted service. Key Updates The minimum instance requirement for enabling Availability Zones has been reduced from three instances to two, while still maintaining a 99.99% SLA. Many existing App Service plans with two or more instances will automatically support Availability Zones without additional setup. The zone redundant setting for App Service plans and App Service Environment v3 is now mutable throughout the life of the resources. Enhanced visibility into Availability Zone information, including physical zone placement and zone counts, is now provided. For App Service Environment v3, the minimum instance fee for enabling Availability Zones has been removed, aligning the pricing model with the multi-tenant App Service offering. The minimum instance requirement for enabling Availability Zones has been reduced from three instances to two. You can now enjoy the benefits of Availability Zones with just two instances since we continue to uphold a 99.99% SLA even with the two-instance configuration. Many existing App Service plans with two or more instances will automatically support Availability Zones without necessitating additional setup. Over the past few years, efforts have been made to ensure that the App Service footprint supports Availability Zones wherever possible, and we’ve made significant gains in doing so. Therefore, many existing customers can enable Availability Zones on their current deployments without needing to redeploy. Along with supporting 2-instance Availability Zone configuration, we have enabled Availability Zones on the App Service footprint in regions where only two zones may be available. Previously, enabling Availability Zones required a region to have three zones with sufficient capacity. To account for the growing demand, we now support Availability Zone deployments in regions with just two zones. This allows us to provide you with Availability Zone features across more regions. And with that, we are upholding the 99.99% SLA even with the 2-zone configuration. Additionally, we are pleased to announce that the zone redundant setting (zoneRedundant property) for App Service plans and App Service Environment v3 is now mutable throughout the life of these resources. This enhancement allows customers on Premium V2, Premium V3, or Isolated V2 plans to toggle zone redundancy on or off as required. With this capability, you can reduce costs and scale to a single instance when multiple instances are not necessary. Conversely, you can scale out and enable zone redundancy at any time to meet your requirements. This ability has been requested for a while now and we are excited to finally make it available. For App Service Environment v3 users, this also means that your individual App Service plan zone redundancy status is now independent of other plans in your App Service Environment. This means that you can have a mix of zone redundant and non-zone redundant plans in an App Service Environment, something that was previously not supported. In addition to these new features, we also have a couple of other exciting things to share. We are now providing enhanced visibility into Availability Zone information, including the physical zone placement of your instances and zone counts. For our App Service Environment v3 customers, we have removed the minimum instance fee for enabling Availability Zones. This means that you now only pay for the Isolated V2 instances you consume. This aligns the pricing model with the multi-tenant App Service offering. For more information as well as guidance on how to use these features, see the docs - Reliability in Azure App Service. Azure Portal support for these new features will be available by mid-June 2025. In the meantime, see the documentation to use these new features with ARM/Bicep or the Azure CLI. Also check out BRK200 breakout session at Microsoft Build 2025 live on May 20th or anytime after via the recording where my team and I will be discussing these new features and many more exciting announcements for Azure App Service. If you’re in the Seattle area and attending Microsoft Build 2025 in person, come meet my team and me at our Expert Meetup Booth. FAQ Q: What are availability zones? Availability zones are physically separate locations within an Azure region, each consisting of one or more data centers with independent power, cooling, and networking. Deploying applications across multiple availability zones ensures high availability and business continuity. Q: How do I enable Availability Zones for my existing App Service plan or App Service Environment v3? There is a new toggle in the Azure portal that will be enabled if your App Service plan or App Service Environment v3 supports Availability Zones. Your deployment must be on the App Service footprint that supports zones in order to have this capability. There is a new property called “MaximumNumberOfZones”, which indicates the number of zones your deployment supports. If this value is greater than one, you are on the footprint that supports zones and can enable Availability Zones as long as you have two or more instances. If this value is equal to one, you need to redeploy. Note that we are continually working to expand the zone footprint across more App Service deployments. Q: Is there an additional charge for Availability Zones? There is no additional charge, you only pay for the instances you use. The only requirement is that you use two or more instances. Q: Can I change the zone redundant property after creating my App Service plan? Yes, the zone redundant property is now mutable, meaning you can toggle it on or off at any time. Q: How can I verify the zone redundancy status of my App Service Plans? We now display the physical zone for each instance, helping you verify zone redundancy status for audits and compliance reviews. Q: How do I use these new features? You can use ARM/Bicep or the Azure CLI at this time. Starting in mid-June, Azure Portal support should be available. The documentation currently shows how to use ARM/Bicep and the Azure CLI to enable these features. The documentation as well as this blog post will be updated once Azure Portal support is available. Q: Are Availability Zones supported on Premium V4? Yes! See the documentation for more details on how to get started with Premium V4 today.3.3KViews8likes2CommentsValidating Change Requests with Kubernetes Admission Controllers
Promoting an application or infrastructure change into production often comes with a requirement to follow a change control process. This ensures that changes to production are properly reviewed and that they adhere to required approvals, change windows and QA process. Often this change request (CR) process will be conducted using a system for recording and auditing the change request and the outcome. When deploying a release, there will often be places in the process to go through this change control workflow. This may be as part of a release pipeline, it may be managed in a pull request or it may be a manual process. Ultimately, by the time the actual changes are made to production infrastructure or applications, they should already be approved. This relies on the appropriate controls and restrictions being in place to make sure this happens. When it comes to the point of deploying resources into production Kubernetes clusters, they should have already been through a CR process. However, what if you wanted a way to validate that this is the case, and block anything from being deployed that does not have an approved CR, providing a backstop to ensure that no unapproved resources get deployed? Let's take a look at how we can use an Admission Controller to do this. Admission Controllers A Kubernetes Admission Controller is a mechanism to provide a checkpoint during a deployment that validates resources and applies rules and policies before this resource is accepted into the cluster. Any request to create, update or delete (CRUD) a resource is first run through any applicable admission controllers to check if it violates any of the required rules. Only if all admission controllers allow the request is it then processed. Kubernetes includes some built-in admission controllers, but you can also create your own. Admission controllers are essentially webhooks that are registered with the Kubernetes API server. When a CRUD request is processed by the API server, it calls any of these webhooks that are registered, and processes the response. When creating your own Admission controller, you would usually implement the webhook as a pod running in the cluster. There are three types of Admission Controller webhooks: MutatingAdmissionWebhook: Can modify the incoming object before it is persisted (e.g., injecting sidecars). ValidatingAdmissionWebhook: Can only approve or reject the request based on validation logic. ValidatingAdmissionPolicy: Validation logic is embedded in the API, rather than requiring a separate web service For our scenario we are going to look at using a ValidatingAdmissionWebhook, as we only want to approve or reject a request based on its change request status. Sample Code In this article, we are not going to go line by line through the code for this admission controller, however you can see an example implementation of this in this repo. In this example, we do not build out the full web service for validating change requests themselves. We have some pre-defined CR IDs with pre-configured statuses returned by the application. In a real world implementation your web service would call out to your change management solution to get the current status of the change request. This does not impact how you would build the Admission Controller, just the business logic inside your controller. Components Our Admission Controller consists of several components: Application Our actual admission controller application, which runs a HTTP service that receives the request from the API Server calling the webhook, processes it and applies business logic, and returns a response. In our example this service has been written in GO, but you can use whatever language you like. Your service must meet the API contract defined for the admission webhook. Our application does the following: Reads the incoming change body YAML and extracts the Change ID from the change.company.com/id annotation that should be applied to the resource. We also support the argocd.argoproj.io/change-id and deployment.company.com/change-id annotations. func extractChangeID(req *admissionv1.AdmissionRequest) string { // Try to extract change ID from object annotations obj := req.Object.Raw var objMap map[string]interface{} if err := json.Unmarshal(obj, &objMap); err != nil { return "" } if metadata, ok := objMap["metadata"].(map[string]interface{}); ok { if annotations, ok := metadata["annotations"].(map[string]interface{}); ok { // Look for change ID in various annotation formats if changeID, ok := annotations["change.company.com/id"].(string); ok { return changeID } if changeID, ok := annotations["argocd.argoproj.io/change-id"].(string); ok { return changeID } if changeID, ok := annotations["deployment.company.com/change-id"].(string); ok { return changeID } } } return "" } If it does not find the required annotation, it immediately fails the validation, as no CR is present. if changeID == "" { // Reject resources without change ID annotation klog.Infof("No change ID found, rejecting request") ac.respond(w, &admissionReview, false, "Change ID annotation is required") return } If the CR is present, it validates it. In our demo application this is checked against a hard-coded list of CRs, but in the real world, this is where you would make a call out to your external change management solution to get the CR with that ID. There are 3 possible outcomes here: The CR ID does not match an ID in our system, the validation fails The CR does match an ID in our system, but this CR is not approved, the validation fails The CR does match an ID in our system and this CR has been approved, the validation passes and the resources are created. changeRecord, err := ac.changeService.ValidateChange(changeID) if err != nil { klog.Errorf("Change validation failed: %v", err) ac.respond(w, &admissionReview, false, fmt.Sprintf("Change validation failed: %v", err)) return } if !changeRecord.Approved { klog.Infof("Change %s is not approved (status: %s)", changeID, changeRecord.Status) ac.respond(w, &admissionReview, false, fmt.Sprintf("Change %s is not approved (status: %s)", changeID, changeRecord.Status)) return } klog.Infof("Change %s is approved, allowing deployment", changeID) ac.respond(w, &admissionReview, true, fmt.Sprintf("Change %s approved by %s", changeID, changeRecord.Requester)) Container To run our Admission Controller inside the AKS cluster we need to create a Docker container that runs our application. In the sample code you will find a Docker file used to build this container. We then push the container to a Docker registry, so we can consume the image when we run the webhook service. Kubernetes Resources To run our Docker container and setup a URL that the API server can call we will deploy: A Kubernetes Deployment A Kubernetes Service A set of RBAC roles and bindings to grant access to the Admission Controller Finally, we will deploy the actual ValidatingAdmissionWebhook resource itself. This resource tells the API servers: Where to call the webhook Which operations should require calling the webhook - in our demo application we look at create and delete operations. If you wanted to validate delete operations had a CR, you could also add that Which resource types need to be validated - in our demo we are looking at Deployments, Services and Configmaps, but you could make this as wide or narrow as you require Which namespaces to validate - we added a condition that only applies this validation to namespaces that have a label of changeValidation set to enabled, this way we can control where this is applied and avoid applying it to things like system namespaces. This is very important to ensure you don't break your core Kubernetes infrastructure. This also allows for differentiation between development and production namespaces, where you likely would not want to require Change Requests in development. Finally, we define what happens when the validation fails. There are two options: fail which blocks the resource creation ignore which ignores the failure and allows the resource to be created apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionWebhook metadata: name: change-validation-webhook spec: clientConfig: service: name: admission-controller namespace: admission-controller path: "/admit" rules: - operations: ["CREATE", "UPDATE"] apiGroups: ["apps"] apiVersions: ["v1"] resources: ["deployments"] - operations: ["CREATE", "UPDATE"] apiGroups: [""] apiVersions: ["v1"] resources: ["services", "configmaps"] namespaceSelector: matchLabels: change-validation: "enabled" admissionReviewVersions: ["v1", "v1beta1"] sideEffects: None failurePolicy: Fail Admission Controller In Action Now that we have our admission controller setup, let's attempt to make a change to a resource. Using a Kubernetes Deployment resource, we will attempt to change the number of replicas from three to two. For this resource, the change.company.com/id annotation is set to CHG-2025-000 which is a change request that doesn't exist in our change management system. apiVersion: apps/v1 kind: Deployment metadata: name: demo-app namespace: demo annotations: change.company.com/id: "CHG-2025-000" labels: app: demo-app environment: development spec: replicas: 2 selector: matchLabels: app: demo-app Once we attempt to deploy this, we will quickly see that the the request to update the resource is denied: one or more objects failed to apply, reason: error when patching "/dev/shm/1236013741": admission webhook "change-validation.company.com" denied the request: Change validation failed: change record not found,admission webhook "change-validation.company.com" denied the request: Change validation failed: change record not found. Similarly, if we change the annotation to CHG-2025-999 which is a change request that does exist, but has not been approved, we again see that the request is denied, but this time the error is clear that it is not approved: one or more objects failed to apply, reason: error when patching "/dev/shm/28290353": admission webhook "change-validation.company.com" denied the request: Change CHG-2025-999 is not approved (status: pending),admission webhook "change-validation.company.com" denied the request: Change validation failed: change record not found. Finally, we update the annotation to CHG-2025-002, which has been approved. This time our deployment update succeeds and the number of replicas has been reduced to two. Next Steps What we have created so far works as a Proof of Concept to confirm that using an Admission Controller for this job will work. To move this into production use, we'd need to take a few more steps: Update our web API to call out to our external change management solution and retrieve real change requests Implement proper security for the Admission Controller with SSL certificates and network restrictions inside the cluster Implement high availability with multiple replicas to ensure the service is always able to respond to requests Implement monitoring and log collection for our service to ensure we are aware of any issues Automate the build and release of this solution, including implementing it's own set of change controls! Conclusions Controlling updates into production through a change control process is vital for a stable, secure and audited production environments. Ideally these CR processes will happen early in the release pipeline in a clear, automated process that avoids getting to the point where anyone tries to deploy unapproved changes into production. However, if you want to ensure that this cannot happen, and put some safeguards to ensure that unapproved changes are always blocked, then the use of Admission Controllers is one way to do this. Creating a custom Admission Controller is relatively straightforward and it allows you to integrate your business processes into the decision on whether a resource can be deployed or not. A change control Admission Controller should not be your only change control process, but it can form part of your layers of control and audit. Further Reading Sample Code Admission Control in Kubernetes Manage Change in the Cloud Adoption Framework215Views0likes0CommentsThroughput Testing at Scale for Azure Functions
Introduction Ensuring reliable, high-performance serverless applications is central to our work on Azure Functions. With new plans like Flex Consumption expanding the platform’s capabilities, it's critical to continuously validate that our infrastructure can scale—reliably and efficiently—under real-world load. To meet that need, we built PerfBench (Performance Benchmarker), a comprehensive benchmarking system designed to measure, monitor, and maintain our performance baselines—catching regressions before they impact customers. This infrastructure now runs close to 5,000 test executions every month, spanning multiple SKUs, regions, runtimes, and workloads—with Flex Consumption accounting for more than half of the total volume. This scale of testing helps us not only identify regressions early, but also understand system behavior over time across an increasingly diverse set of scenarios. of all Python Function apps across regions (SKU: Flex Consumption, Instance Size: 2048 – 1000 VUs over 5 mins, HTML Parsing test) Motivation: Why We Built PerfBench The Need for Scale Azure Functions supports a range of triggers, from HTTP requests to event-driven flows like Service Bus or Storage Queue messages. With an ever-growing set of runtimes (e.g., .NET, Node.js, Python, Java, PowerShell) and versions (like Python 3.11 or .NET 8.0), multiple SKUs and regions, the possible test combinations explode quickly. Manual testing or single-scenario benchmarks no longer cut it. The current scope of coverage tests. Plan PricingTier DistinctTestName FlexConsumption FLEX2048 110 FlexConsumption FLEX512 20 Consumption CNS 36 App Service Plan P1V3 32 Functions Premium EP1 46 Table 1: Different test combinations per plan based on Stack, Pricing Tier, Scenario, etc. This doesn’t include the ServiceBus tests. The Flex Consumption Plan There have been many iterations of this infrastructure within the team, and we’ve been continuously monitoring the Functions performance for more than 4 years now - with more than a million runs till now. But with the introduction of the Flex Consumption plan (Preview at the time of building PerfBench), we had to redesign the testing from ground up, as Flex Consumption unlocks new scaling behaviors and needed thorough testing—millions of messages or tens of thousands of requests per second—to ensure confidence in performance goals and regressions prevention. Consumption, Instance Size: 2048) PerfBench: High-Level Architecture Overview PerfBench is composed of several key pieces: Resource Creator – Uses meta files and Bicep templates to deploy receiver function apps (test targets) at scale. Test Infra Generator – Deploys and configures the system that actually does the load generation (e.g., SBLoadGen function app, Scheduler function app, ALT webhook function). Test Infra – The “brain” of testing, including the Scheduler, Azure Load Testing integration, and SBLoadGen. Receiver Function Apps – Deployed once per combination of runtime, version, region, OS, SKU, and scenario. Data Aggregation & Dashboards – Gathers test metrics from Azure Load Testing (ALT) or SBLoadGen, stores them in Azure Data Explorer (ADX), and displays trends in ADX dashboards. Below is a simplified architecture diagram illustrating these components: Components Resource Creator The resource creator uses meta files and Jinja templates to generate Bicep templates for creating resources. Meta Files: We define test scenarios in simple text-based files (e.g., os.txt, runtime_version.txt, sku.txt, scenario.txt). Each file lists possible values (like python|3.11 or dotnet|8.0) and short codes for resource naming. Template Generation: A script reads these meta files and uses them to produce Bicep templates—one template per valid combination—deploying receiver function apps into dedicated resource groups. Filters: Regex-like patterns in a filter.txt file exclude unwanted combos, keeping the matrix manageable. CI/CD Flow: Whenever we add a new runtime or region, a pull request updates the relevant meta file. Once merged, our pipeline regenerates Bicep and redeploys resources (these are idempotent updates). Test Infra Generator Deploys and configures the Scheduler Function App, SBLoadGen Durable Functions app, and the ALT webhook function. Similar CI/CD approach—merging changes triggers the creation (or update) of these infrastructure components. Test Infra: Load Generation, Scheduling, and Reporting Scheduler The conductor of the whole operation that runs every 5 minutes to load test configurations ( test_configs.json) from Blob Storage. The configuration includes details on what tests to run, at what time (e.g., “run at 13:45 daily”), and references to either ALT for HTTP or SBLoadGen for non-HTTP tests - to schedule them using different systems. Some tests run multiple times daily, others once a day; a scheduled downtime is built in for maintenance. HTTP Load Generator - Azure Load Testing (ALT) We utilize Azure Functions to trigger Azure Load Tests (ALT) for HTTP-based scenarios. ALT is a production-grade load generator tool that provides an easy to configure way to send load to different server endpoints using JMeter and Locust. We worked closely with the ALT team to optimize the JMeter scripts for different scenarios and it recently completed second year. We created an abstraction on top of ALT to create a webhook-approach of starting tests as well as get notified when tests finish, and this was done using a custom function app that does the following: Initiate a test run using a predefined JMX file. Continuously poll until the test execution is complete. Retrieve the test results and transform them into the required format. Transmit the formatted results to the data aggregation system. Sample ALT Test Run: 8.8 million requests in under 6 minutes, with a 90th percentile response time of 80ms and zero errors. The system maintained a throughput of 28K+ RPS. Some more details that we did within ALT - 25 Runtime Controllers manage the test logic and concurrency. 40 Engines handle actual load execution, distributing test plans. 1,000 Clients total for 5-minute runs to measure throughput, error rates, and latency. Test Types: HelloWorld (GET request, to understand baseline of the system). HtmlParser (POST request sending HTML for parsing to simulate moderate CPU usage). Service Bus Load Generator - SBLoadGen (Durable Functions) For event-driven scenarios (e.g., Service Bus–based triggers), we built SBLoadGen. It’s a Durable Function that uses the fan-out pattern to distribute work across multiple workers—each responsible for sending a portion of the total load. In a typical run, we aim to generate around one million messages in under a minute to stress-test the system. We intentionally avoid a fan-in step—once messages are in-flight, the system defers to the receiver function apps to process and emit relevant telemetry. Highlights: Generates ~1 million messages in under a minute. Durable Function apps are deployed regionally and are triggered via webhook. Implemented as a Python Function App using Model V2. Note: This would be open sourced in the coming days. Receiver Function Apps (Test apps) These are the actual apps receiving all the load generated. They are deployed with different combinations and updated rarely. Each valid combination (region + OS + runtime + SKU + scenario) gets its own function app, receiving load from ALT or SBLoadGen. HTTP Scenarios: HelloWorld: No-op test to measure overhead of the system and baseline. HTML Parser: POST with an HTML document for parsing (Simulating small CPU load). Non-HTTP (Service Bus) Scenario: CSV-to-JSON plus blob storage operations, blending compute and I/O overhead. Collected Metrics: RPS: Requests per second (RPS), success/error rates, latency distributions for HTTP workloads. MPPS: Messages processed per second (MPPS), success/error rates for non-HTTP (e.g. Service Bus) workloads. Data Aggregation & Dashboards Capturing results at scale is just as important as generating load. PerfBenchV2 uses a modular data pipeline to reliably ingest and visualize metrics from both HTTP and Service Bus–based tests. All test results flow through Event Hubs, which act as an intermediary between the test infrastructure and our analytics platform. The webhook function (used with ALT) and the SBLoadGen app both emit structured logs that are routed through Event Hub streams and ingested into dedicated Azure Data Explorer (ADX) tables. We use three main tables in ADX: HTTPTestResults for test runs executed via Azure Load Testing. SBLoadGenRuns for recording message counts and timing data from Service Bus scenarios. SchedulerRuns to log when and how each test was initiated. On top of this telemetry, we’ve built custom ADX dashboards that allow us to monitor trends in latency, throughput, and error rates over time. These dashboards provide clear, actionable views into system behavior across dozens of runtimes, regions, and SKUs. Because our focus is on long-term trend analysis, rather than real-time anomaly detection, this batch-oriented approach works well and reduces operational complexity. CI/CD Pipeline Integration Continuous Updates: Once a new language version or scenario is added to runtime_version.txt or scenario.txt meta files, the pipeline regenerates Bicep and deploys new receiver apps. The Test Infra Generator also updates or redeploys the needed function apps (Scheduler, SBLoadGen, or ALT webhook) whenever logic changes. Release Confidence: We run throughput tests on these new apps early and often, catching any performance regressions before shipping to customers. Challenges & Lessons Learned Designing and running this infrastructure hasn't been easy and we've learned a lot of valuable lessons on the way. Here are few Exploding Matrix - Handling every runtime, OS, SKU, region, scenario can lead to thousands of permutations. Meta files and a robust filter system help keep this under control, but it remains an ongoing effort. Cloud Transience - With ephemeral infrastructure, sometimes tests fail due to network hiccups or short-lived capacity constraints. We built in retries and redundancy to mitigate transient failures. Early Adoption - PerfBench was among the first heavy “customers” of the new Flex Consumption plan. At times, we had to wait for Bicep features or platform fixes—but it gave us great insight into the plan’s real-world performance. Maintenance & Cleanup - When certain stacks or SKUs near end-of-life, we have to decommission their resources—this also means regular grooming of meta files and filter rules. Success Stories Proactive Regression Detection: PerfBench surfaced critical performance regressions early—often before they could impact customers. These insights enabled timely fixes and gave us confidence to move forward with the General Availability of Flex Consumption. Production-Level Confidence: By continuously running tests across live production regions, PerfBench provided a realistic view of system behavior under load. This allowed the team to fine-tune performance, eliminate bottlenecks, and achieve improvements measured in single-digit milliseconds. Influencing Product Evolution: As one of the first large-scale internal adopters of the Flex Consumption plan, PerfBench served as a rigorous validation tool. The feedback it generated played a direct role in shaping feature priorities and improving platform reliability—well before broader customer adoption. Future Directions Open sourcing: We are in the process of open sourcing all the relevant parts of PerfBench - SBLoadGen, BicepTemplates generator, etc. Production Synthetic Validation and Alerting: Adapting PerfBench’s resource generation approach for ongoing synthetic tests in production, ensuring real environments consistently meet performance SLOs. This will also open up alerting and monitoring scenarios across production fleet. Expanding Trigger Coverage and Variations: Exploring additional triggers like Storage queues or Event Hub triggers to broaden test coverage. Testing different settings within the same scenario (e.g., larger payloads, concurrency changes). Conclusion PerfBench underscores our commitment to high-performance Azure Functions. By automating test app creation (via meta files and Bicep), orchestrating load (via ALT and SBLoadGen), and collecting data in ADX, we maintain a continuous pulse on throughput. This approach has already proven invaluable for Flex Consumption, and we’re excited to expand scenarios and triggers in the future. For more details on Flex Consumption and other hosting plans, check out the Azure Functions Documentation. We hope the insights shared here spark ideas for your own large-scale performance testing needs — whether on Azure Functions or any other distributed cloud services. Acknowledgements We’d like to acknowledge the entire Functions Platform and Tooling teams for their foundational work in enabling this testing infrastructure. Special thanks to the Azure Load Testing (ALT) team for their continued support and collaboration. And finally, sincere appreciation to our leadership for making performance a first-class engineering priority across the stack. Further Reading Azure Functions Azure Functions Flex Consumption Plan Azure Durable Funtions Azure Functions Python Developer Reference Guide Azure Functions Performance Optimizer Example case study: Github and Azure Functions Azure Load Testing Overview Azure Data Explorer Dashboards If you have any questions or want to share your own performance testing experiences, feel free to reach out in the comments!799Views0likes0CommentsAzure Kubernetes Service Baseline - The Hard Way, Third time's a charm
1 Access management Azure Kubernetes Service (AKS) supports Microsoft Entra ID integration, which allows you to control access to your cluster resources using Azure role-based access control (RBAC). In this tutorial, you will learn how to integrate AKS with Microsoft Entra ID and assign different roles and permissions to three types of users: An admin user, who will have full access to the AKS cluster and its resources. A backend ops team, who will be responsible for managing the backend application deployed in the AKS cluster. They will only have access to the backend namespace and the resources within it. A frontend ops team, who will be responsible for managing the frontend application deployed in the AKS cluster. They will only have access to the frontend namespace and the resources within it. By following this tutorial, you will be able to implement the least privilege access model, which means that each user or group will only have the minimum permissions required to perform their tasks. 1.1 Introduction In this third part of the blog series, you will learn how to: Harden your AKS cluster. - Update an existing AKS cluster to support Microsoft Entra ID integration enabled. Create a Microsoft Entra ID admin group and assign it the Azure Kubernetes Service Cluster Admin Role. Create a Microsoft Entra ID backend ops group and assign it the Azure Kubernetes Service Cluster User Role. Create a Microsoft Entra ID frontend ops group and assign it the Azure Kubernetes Service Cluster User Role. Create Users in Microsoft Entra ID Create role bindings to grant access to the backend ops group and the frontend ops group to their respective namespaces. Test the access of each user type by logging in with different credentials and running kubectl commands. 1.2 Prequisities: This section outlines the recommended prerequisites for setting up Microsoft entra ID with AKS. Highly recommended to complete Azure Kubernetes Service Baseline - The Hard Way here! or follow the Microsoft official documentation for a quick start here! Note that you will need to create 2 namespaces in kubernetes one called frontend and the second one called backend. 1.3 Target Architecture Throughout this article, this is the target architecture we will aim to create: all procedures will be conducted by using Azure CLI. The current architecture can be visualized as followed: 1.4 Deployment 1.4.1 Prepare Environment Variables This code defines the environment variables for the resources that you will create later in the tutorial. Note: Ensure environment variable $STUDENT_NAME and placeholder <TENANT SUB DOMAIN NAME>is set before adding the code below. # Define the name of the admin group ADMIN_GROUP='ClusterAdminGroup-'${STUDENT_NAME} # Define the name of the frontend operations group OPS_FE_GROUP='Ops_Fronted_team-'${STUDENT_NAME} # Define the name of the backend operations group OPS_BE_GROUP='Ops_Backend_team-'${STUDENT_NAME} # Define the Azure AD UPN (User Principal Name) for the frontend operations user AAD_OPS_FE_UPN='opsfe-'${STUDENT_NAME}'@<SUB DOMAIN TENANT NAME HERE>.onmicrosoft.com' # Define the display name for the frontend operations user AAD_OPS_FE_DISPLAY_NAME='Frontend-'${STUDENT_NAME} # Placeholder for the frontend operations user password AAD_OPS_FE_PW=<ENTER USER PASSWORD> # Define the Azure AD UPN for the backend operations user AAD_OPS_BE_UPN='opsbe-'${STUDENT_NAME}'@<SUB DOMAIN TENANT NAME HERE>.onmicrosoft.com' # Define the display name for the backend operations user AAD_OPS_BE_DISPLAY_NAME='Backend-'${STUDENT_NAME} # Placeholder for the backend operations user password AAD_OPS_BE_PW=<ENTER USER PASSWORD> # Define the Azure AD UPN for the cluster admin user AAD_ADMIN_UPN='clusteradmin'${STUDENT_NAME}'@<SUB DOMAIN TENANT NAME HERE>.onmicrosoft.com' # Placeholder for the cluster admin user password AAD_ADMIN_PW=<ENTER USER PASSWORD> # Define the display name for the cluster admin user AAD_ADMIN_DISPLAY_NAME='Admin-'${STUDENT_NAME} 1.4.2 Create Microsoft Entra ID Security Groups We will now start by creating 3 security groups for respective team. Create the security group for Cluster Admins az ad group create --display-name $ADMIN_GROUP --mail-nickname $ADMIN_GROUP 2. Create the security group for Application Operations Frontend Team az ad group create --display-name $OPS_FE_GROUP --mail-nickname $OPS_FE_GROUP 3. Create the security group for Application Operations Backend Team az ad group create --display-name $OPS_BE_GROUP --mail-nickname $OPS_BE_GROUP Current architecture can now be illustrated as follows: 1.4.3 Integrate AKS with Microsoft Entra ID 1. Lets update our existing AKS cluster to support Microsoft Entra ID integration, and configure a cluster admin group, and disable local admin accounts in AKS, as this will prevent anyone from using the --admin switch to get full cluster credentials. az aks update -g $SPOKE_RG -n $AKS_CLUSTER_NAME-${STUDENT_NAME} --enable-azure-rbac --enable-aad --disable-local-accounts Current architecture can now be described as follows: 1.4.4 Scope and Role Assignment for Security Groups This chapter describes how to create the scope for the operation teams to perform their daily tasks. The scope is based on the AKS resource ID and a fixed path in AKS, which is /namespaces/. The scope will assign the Application Operations Frontend Team to the frontend namespace and the Application Operation Backend Team to the backend namespace. Lets start by constructing the scope for the operations team. AKS_BACKEND_NAMESPACE='/namespaces/backend' AKS_FRONTEND_NAMESPACE='/namespaces/frontend' AKS_RESOURCE_ID=$(az aks show -g $SPOKE_RG -n $AKS_CLUSTER_NAME-${STUDENT_NAME} --query 'id' --output tsv) 2. Lets fetch the Object ID of the operations teams and admin security groups. Application Operation Frontend Team. FE_GROUP_OBJECT_ID=$(az ad group show --group $OPS_FE_GROUP --query 'id' --output tsv) Application Operation Backend Team. BE_GROUP_OBJECT_ID=$(az ad group show --group $OPS_BE_GROUP --query 'id' --output tsv Admin. ADMIN_GROUP_OBJECT_ID=$(az ad group show --group $ADMIN_GROUP --query 'id' --output tsv) 3) This commands will grant the Application Operations Frontend Team group users the permissions to download the credential for AKS, and only operate within given namespace. az role assignment create --assignee $FE_GROUP_OBJECT_ID --role "Azure Kubernetes Service RBAC Writer" --scope ${AKS_RESOURCE_ID}${AKS_FRONTEND_NAMESPACE} az role assignment create --assignee $FE_GROUP_OBJECT_ID --role "Azure Kubernetes Service Cluster User Role" --scope ${AKS_RESOURCE_ID} 4) This commands will grant the Application Operations Backend Team group users the permissions to download the credential for AKS, and only operate within given namespace. az role assignment create --assignee $BE_GROUP_OBJECT_ID --role "Azure Kubernetes Service RBAC Writer" --scope ${AKS_RESOURCE_ID}${AKS_BACKEND_NAMESPACE} az role assignment create --assignee $BE_GROUP_OBJECT_ID --role "Azure Kubernetes Service Cluster User Role" --scope ${AKS_RESOURCE_ID} 5) This command will grant the Admin group users the permissions to connect to and manage all aspects of the AKS cluster. az role assignment create --assignee $ADMIN_GROUP_OBJECT_ID --role "Azure Kubernetes Service RBAC Cluster Admin" --scope ${AKS_RESOURCE_ID} Current architecture can now be described as follows: 1.4.5 Create Users and Assign them to Security Groups. This exercise will guide you through the steps of creating three users and adding them to their corresponding security groups. Create the Admin user. az ad user create --display-name $AAD_ADMIN_DISPLAY_NAME --user-principal-name $AAD_ADMIN_UPN --password $AAD_ADMIN_PW 2. Assign the admin user to admin group for the AKS cluster. First identify the object id of the user as we will need this number to assign the user to the admin group. ADMIN_USER_OBJECT_ID=$(az ad user show --id $AAD_ADMIN_UPN --query 'id' --output tsv) 3. Assign the user to the admin security group. az ad group member add --group $ADMIN_GROUP --member-id $ADMIN_USER_OBJECT_ID 4. Create the frontend operations user. az ad user create --display-name $AAD_OPS_FE_DISPLAY_NAME --user-principal-name $AAD_OPS_FE_UPN --password $AAD_OPS_FE_PW 5. Assign the frontend operations user to frontend security group for the AKS cluster. First identify the object id of the user as we will need this number to assign the user to the frontend security group. FE_USER_OBJECT_ID=$(az ad user show --id $AAD_OPS_FE_UPN --query 'id' --output tsv) 6. Assign the user to the frontend security group. az ad group member add --group $OPS_FE_GROUP --member-id $FE_USER_OBJECT_ID 7. Create the backend operations user. az ad user create --display-name $AAD_OPS_BE_DISPLAY_NAME --user-principal-name $AAD_OPS_BE_UPN --password $AAD_OPS_BE_PW 8. Assign the backend operations user to backend security group for the AKS cluster. First identify the object id of the user as we will need this number to assign the user to the backend security group. BE_USER_OBJECT_ID=$(az ad user show --id $AAD_OPS_BE_UPN --query 'id' --output tsv) 9. Assign the user to the backend security group. az ad group member add --group $OPS_BE_GROUP --member-id $BE_USER_OBJECT_ID Current architecture can now be described as follows: 1.4.6 Validate your deployment in the Azure portal. Navigate to the Azure portal at https://portal.azure.com and enter your login credentials. Once logged in, on your top left hand side, click on the portal menu (three strips). From the menu list click on Microsoft Entra ID. On your left hand side menu under Manage click on Users. Validate that your users are created, there shall be three users, each user name shall end with your student name. On the top menu bar click on the Users link. On your left hand side menu under Manage click on Groups. Ensure you have three groups as depicted in the picture, the group names should end with your student name. Click on security group called Ops_Backend_team-YOUR STUDENT NAME. On your left hand side menu click on Members, verify that your user Backend-YOUR STUDENT NAME is assigned. On your left hand side menu click on Azure role Assignments, from the drop down menu select your subscription. Ensure the following roles are assigned to the group: Azure Kubernetes service Cluster User Role assigned on the Cluster level and Azure Kubernetes Service RBAC Writer assigned on the namespace level called backend. 11.On the top menu bar click on Groups link. Repeat step 7 - 11 for Ops_Frontend_team-YOUR STUDENT NAME and ClusterAdminGroup-YOUR STUDENT NAME 1.4.7 Validate the Access for the Different Users. This section will demonstrate how to connect to the AKS cluster from the jumpbox using the user account defined in Microsoft Entra ID. Note: If you deployed your AKS cluster using the quick start method We will check two things: first, that we can successfully connect to the cluster; and second, that the Operations teams have access only to their own namespaces, while the Admin has full access to the cluster. Navigate to the Azure portal at https://portal.azure.com and enter your login credentials. Once logged in, locate and select your rg-hub where the Jumpbox has been deployed. Within your resource group, find and click on the Jumpbox VM. In the left-hand side menu, under the Operations section, select Bastion. Enter the credentials for the Jumpbox VM and verify that you can log in successfully. First remove the existing stored configuration that you have previously downloaded with Azure CLI and kubectl. From the Jumpbox VM execute the following commands: rm -R .azure/ rm -R .kube/ Note: The .azure and .kube directories store configuration files for Azure and Kubernetes, respectively, for your user account. Removing these files triggers a login prompt, allowing you to re-authenticate with different credentials. 7. Retrieve the username and password for Frontend user. Important: Retrieve the username and password from your local shell, and not the shell from Jumpbox VM. echo $AAD_OPS_FE_UPN echo $AAD_OPS_FE_PW 8. From the Jumpbox VM initiate the authentication process. az login Example output: bash azureuser@Jumpbox-VM:~$ az login To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code XXXXXXX to authenticate. 9. Open a new tab in your web browser and access https://microsoft.com/devicelogin. Enter the generated code, and press Next 10. You will be prompted with an authentication window asking which user you want to login with select Use another account and supply the username in the AAD_OPS_FE_UPN variable and password from variable AAD_OPS_FE_PW and then press Next. Note: When you authenticate with a user for the first time, you will be prompted by Microsoft Authenticator to set up Multi-Factor Authentication (MFA). Choose "I want to setup a different method" option from the drop-down menu, and select Phone, supply your phone number, and receive a one-time passcode to authenticate to Azure with your user account. 11. From the Jumpbox VM download AKS cluster credential. SPOKE_RG=rg-spoke STUDENT_NAME= AKS_CLUSTER_NAME=private-aks az aks get-credentials --resource-group $SPOKE_RG --name $AKS_CLUSTER_NAME-${STUDENT_NAME} You should see a similar output as illustrated below: bash azureuser@Jumpbox-VM:~$ az aks get-credentials --resource-group $SPOKE_RG --name $AKS_CLUSTER_NAME-${STUDENT_NAME} Merged "private-aks" as current context in /home/azureuser/.kube/config azureuser@Jumpbox-VM:~$ 12. You should be able to list all pods in namespace frontend. You will now be prompted to authenticate your user again, as this time it will validate your newly created user permissions within the AKS cluster. Ensure you login with the user you created i.e $AAD_OPS_FE_UPN, and not your company email address. kubectl get po -n frontend Example output: azureuser@Jumpbox-VM:~$ kubectl get po -n frontend To sign in, use a web browser to open the page https://microsoft.com/devicelogin and enter the code XXXXXXX to authenticate. NAME READY STATUS RESTARTS AGE nginx 1/1 Running 0 89m 13. Try to list pods in default namespace bash kubectl get pods Example output: bash azureuser@Jumpbox-VM:~$ kubectl get po Error from server (Forbidden): pods is forbidden: User "opsfe-test@xxxxxxxxxx.onmicrosoft.com" cannot list resource "pods" in API group "" in the namespace "default": User does not have access t o the resource in Azure. Update role assignment to allow access. 14. Repeat step 6 and 13 for the remaining users, and see how their permissions differs. # Username and password for Admin user execute the command from your local shell and not from Jumpbox VM echo $AAD_ADMIN_UPN echo $AAD_ADMIN_PW # Username and password for Backend user execute the command from your local shell and not from Jumpbox VM echo $AAD_OPS_BE_UPN echo $AAD_OPS_BE_PW 🎉 Congratulations, you made it to the end! You’ve just navigated the wild waters of Microsoft Entra ID and AKS — and lived to tell the tale. Whether you’re now a cluster conqueror or an identity integration ninja, give yourself a high five (or a kubectl get pods if that’s more your style). Now go forth and secure those clusters like the cloud hero you are. 🚀 And remember: with great identity comes great responsibility.536Views1like0CommentsReimagining App Modernization for the Era of AI
This blog highlights the key announcements and innovations from Microsoft Build 2025. It focuses on how AI is transforming the software development lifecycle, particularly in app modernization. Key topics include the use of GitHub Copilot for accelerating development and modernization, the introduction of Azure SRE agent for managing production systems, and the launch of the App Modernization Guidance to help organizations modernize their applications with AI-first design. The blog emphasizes the strategic approach to modernization, aiming to reduce complexity, improve agility, and deliver measurable business outcomes3.1KViews2likes0CommentsGet ready for Microsoft Build 2025
Microsoft Build is just a few weeks away. To celebrate, we’re highlighting resources that will help you get ready for the big event. Explore some of the exciting sessions you can join in-person or online, learn new skills before jumping into live deep-dive sessions, brush up on best practices, and get up to speed on the latest developer tools so you can hit the event ready to take your knowledge (and your applications) to the next level. Connect, code, and grow at Build It’s almost time for Microsoft Build! Can’t join the event live in-person? No problem. You can still experience the event streaming live online for free (May 19-22). Watch the keynote, join live sessions, learn new skills, and watch in-depth demos. Join the .NET & C# teams at Microsoft Build 2025 Don’t miss this opportunity to connect with the .NET and C# teams at Microsoft Build. There are more than 75 .NET sessions planned for this year’s event. Check out this blog post for highlights of some of the .NET sessions to watch for. How Microsoft developers use AI in real-world coding Tired of the AI hype? Watch this Microsoft Build session to see how real developers at Microsoft use GitHub Copilot in their own coding workflows, with live coding demos. Leave with practical tips for applying AI to your coding. What's next in C# This Microsoft Build session will provide a demo-rich tour of upcoming features in C# 14 and beyond. See what’s next in C# and discover how C# keeps making your code clearer, cleaner, and more expressive. Python meets .NET: Building AI solutions with combined strengths Python and .NET are a great combination! Join this Microsoft Build session where Scott Hanselman and Anthony Shaw will discuss and demo how Python can help spice up your .NET applications. Under the hood and into the magic of GitHub Copilot Have you ever wondered how GitHub Copilot turns prompts into code suggestions? In this Microsoft Build session, the GitHub Copilot team will look at how they built this tool, how it works, and how it keeps up in a quickly evolving landscape. VS Code, Live! at Microsoft Build VS Code, Live! is doing a special in-person session—and you’re invited. Join the session at Microsoft Build (or stream it online) to watch interactive demos, see some sneak peeks, and get a behind-the-scenes look at the latest announcements. Which AI model should I use with GitHub Copilot? If you’ve ever wondered which AI model is the best fit for your GitHub Copilot project, you’re not alone. Each model has its own strengths and picking the right one can feel mysterious. This blog post outlines things to consider when deciding. Elevate your AI skills today Unlock the power of AI and prepare for the upcoming Microsoft Build event. Visit the AI learning hub on Microsoft Learn to explore innovative technologies, learn new skills, and improve your developer productivity. Top 12 scenarios to streamline Azure tasks in VS Code using GitHub Copilot GitHub Copilot for Azure can help streamline your Azure workflows. This collection of videos explores 12 scenarios where GitHub Copilot can accelerate your cloud projects and developer tasks. VS Code agent mode just changed everything Can Visual Studio Code agent mode build an entire app for you complete with a database? Burke Holland demonstrates how. But will it work? You’ll have to watch the video to find out. .NET Conf: Focus on Modernization Missed the .NET Conf: Focus on Modernization event? Catch up on all the sessions with the on-demand playlist. Inside Microsoft Dev Box: Scalable cloud development Microsoft Dev Box provides secure, ready-to-code cloud development environments that help teams move faster and stay secure. Watch a demo of how a company would set up Microsoft Dev Box for their developers in Seattle and Berlin. AI Skills Fest Challenge: Extend Microsoft 365 Copilot Chat and Teams with agents and apps Join the challenge and learn how to extend Microsoft 365 Copilot Chat and Microsoft Teams with custom apps and agents. This challenge will guide you through modules that help you learn along the way. Developer focused how-tos, use cases, and solutions on Microsoft Azure Want a dive into developer focused “How To” content, explore use cases, and learn about solutions? Check out the All Things Azure blog. Learn how to build multi-agent AI apps, understand how GitHub Copilot works under the hood, and more. Sip & Sync with Azure Watch the Sip & Sync with Azure video series to get up to speed on the latest Azure and AI information. Episodes explore topics like GenAI on Azure and quickly building AI agents with Azure AI Foundry. Season of Agents: Find an event near you Find a local event to dive into agents and learn new skills. Watch JDConf 2025 on demand JDConf 2025 is over but you can still watch the sessions on demand. Learn about building modern apps in the cloud, integrating AI, using AI-assisted dev tools, and other critical skills for Java developers. From prompt to production: Build a landing page with Copilot agent mode See how you can build a developer-focused landing page in under 30 minutes using GitHub Copilot agent mode and Claude 3.5 Sonnet—with just screenshots and prompts.3.6KViews1like0CommentsTracking Kubernetes Updates in AKS Clusters
When you support Azure Kubernetes Service (AKS) clusters, keeping up with new versions of Kubernetes being released, and ensuring that your clusters are on a supported version can be difficult. If you have one or two clusters it might be OK, but as your estate grows it can be difficult to keep on top of which clusters have which version of Kubernetes and which needs updates. One way of dealing with this could be to implement Azure Kubernetes Fleet Manager (Fleet). Fleet provides a comprehensive solution for monitoring Kubernetes and Node Image versions in your clusters, and rolling out updates across your estate. You can read more details on Fleet for update management here. However, if you're not ready to implement Fleet, or your AKS estate isn't large enough to warrant it, we can build a solution using resource graph and Azure workbooks that can provide an overview of all your current AKS clusters Kubernetes versions, and which have pending upgrades. Collecting Data To be able to create our report, we need two pieces of information: The version of Kubernetes deployed on each AKS cluster The currently supported versions of Kubernetes in AKS The first piece of information we can get using a Resource Graph Query. Resource Graph allows you to query and explore your Azure resources using the Kusto Query Language (KQL). We can query for any AKS clusters in our subscriptions and get the Kubernetes Version property. resources | where type =~ 'microsoft.containerservice/managedclusters' | extend currentVersion = tostring(properties.kubernetesVersion) | project id, name, currentVersion Running that in Resource Graph Explorer confirms that we get the data we need. The second piece of information, the supported versions of Kubernetes, is not available through the Resource Graph. We can get this information from an Azure CLI command: az aks get-versions --location eastus --output table The location should be set to wherever your clusters are, to get versions specific to your location. Version updates roll out to different regions at different times. You can track which releases are in which regions using the AKS Release Tracker. This command outputs a table of versions of Kubernetes that are supported on AKS, along with more details about the support plan (standard or LTS) and patch versions. We could manually compare the two values, but that's not going to scale, so we now need to bring these two pieces of data together create an automated report. Azure Workbook We need to build a report that will allow us to show all clusters that have upgrades available based on the data we saw above. Azure Workbooks allow us to query data from various different places in Azure to build our report. Create a Workbook To build our report we will be using Azure Monitor Workbooks, which allow us to bring multiple different Azure data sources into a single report and visualise the data. We need to create a new workbook and open it for editing. In the Azure Portal, search for "Azure Workbooks" and click to open the workbooks page Click "create" to create a new workbook, and then select the empty workbook option from the quick start list You should now have an empty workbook, ready for you to add content. Supported Versions Getting the supported versions of Kubernetes into our report is probably the trickiest part of creating this report. We can't just run an Azure CLI command in an Azure workbook, so we can't replicate what we did above. However, the CLI is calling an Azure Resource Manager API, and we can call those from our workbook using an Azure Resource Manager query. In our workbook, go to the Add button and click Add Query In the Data Source drop down select "Azure Resource Manager" Leave the method set as "GET" In the "Path" box enter a path with a value similar to below" but replacing "SubscriptionId" with your own subscription. This can be any subscription you have access to, it does not need to contain the AKS clusters, but the region selected should be correct to ensure the versions returned are appropriate for your clusters. /subscriptions/{SubscriptionID}/providers/Microsoft.ContainerService/locations/westeurope/kubernetesVersions?api-version=2025-02-01 Your query should look similar to this: If you click "Run Query" you should get a JSON document back with details of supported AKS versions. Whilst this provides us with enough information to get the data we need, it's not going to look great on your report. We also want to allow the user to select which version they want to query against, so let's clean it up. Select the "Results Settings" tab Switch the format from "Content" to "JSON Path" Configure the JSON Path settings as in the image below to give as a nice table that shows the version and the support plan. If we now run the query we should have a nice table: The last thing we want to do is allow the user to select the version of Kubernetes they are interested in, and set this is a parameter so we can use it in our Resource Graph query. Click on the "Advanced Settings" button Check the "When items are selected, export parameters" box Click the "Add Parameter" button Set the "Field to Export" to "version" and then set "Parameter Name" to any name you wish Your advanced settings page should look like this, feel free to set some of the other fields like titles and no data messages if you wish. Click done editing to commit your changes. Resource Graph Now we have our supported versions, we can create a Resource Graph query which will find the clusters that have a version of Kubernetes older than our selected supported version. We'll once again click and and go to query In the Data Source dropdown select "Azure Resource Graph" keep the resource type as "subscriptions" and then set the subscriptions drop down to either the subscriptions you are interested in, or all subscriptions In the query box, enter the query below. We'll break down what this is doing in a moment. resources | where type =~ 'microsoft.containerservice/managedclusters' | extend currentVersion = tostring(properties.kubernetesVersion) | extend orchestratorVersion = "{k8VersionNum}" | extend parsedClusterVersion = parse_version(currentVersion), parsedOrchestratorVersion = parse_version(orchestratorVersion) | where parsedClusterVersion < parsedOrchestratorVersion | project id, name, location, currentVersion, orchestratorVersion If you select a version in the supported versions list we created earlier, then click "Run Query" you should get a list of AKS clusters back, assuming you have clusters running older versions. Here's what this query is doing: Finding all resources with a type of "microsoft.containerservice/managedclusters", which is the resource type for AKS clusters Getting the Kubernetes version from the cluster and assigning it to a variable called "currentVersion" Get the "k8VersionNum" variable that we set in the previous step and putting it in a variable called "orchestratorVersion". Using the parse_version function to parse these two strings into version numbers that can then be numerically compared Checking if the cluster version is lower than the selected Kubernetes version Projecting the values we want to see on the report. The final step we're going to add is to hide this section when no Kubernetes version is selected in the top table. This avoids the page showing an error when the parameter is empty. Click on the "Advanced Settings" tab Check the "Make this item conditionally visible" box Click add condition, and then enter the name of your parameter, in my case k8VersionNum. Set the drop down to be "is not equal" and leave the value empty, so it shows "not set" You can also optionally update the "No Data" message so that it shows a success message indicating there are no clusters with pending updates, when the query is empty. Completed Workbook After adding a few Markdown text fields to make the report more usable, and then saving the workbook, we can publish our solution. You can also find the full workbook definition code here, you just need to update the subscription ID. Alternative Workbook If you would like to reverse the approach we took, and be able to select an AKS cluster and see what upgrades are available for that cluster you can use this code to create the workbook.504Views0likes0CommentsFSI Knowledge Mining and Intelligent Document Process Reference Architecture
FSI customers such as insurance companies and banks rely on their vast amounts of data to provide sometimes hundreds of individual products to their customers. From assessing product suitability, underwriting, fraud investigations, and claims handling, many employees and applications depend on accessing this data to do their jobs efficiently. Since the capabilities of GenAI have been realised, we have been helping our customers in this market transform their business with unified systems that simplify access to this data and speed up the processing times of these core tasks, while remaining compliant with the numerous regulations that govern the FSI space. Combining the use of Knowledge Mining with Intelligent Document processing provides a powerful solution to reduce the manual effort and inefficacies of ensuring data integrity and retrieval across the many use cases that most of our customers face daily. What is Knowledge Mining and Intelligent Document Processing? Knowledge Mining is a process that transforms large, unstructured data sets into searchable knowledge stores. Traditional search methods often rely on keyword matching, which can miss the context of the information. In contrast, knowledge mining uses advanced techniques like natural language processing (NLP) to understand the context and meaning behind the data, providing a robust searching mechanism that can look across all these data sources, understand the relationships between the data therefore providing more accurate and relevant results. Intelligent Document Processing (IDP) is a workflow automation technology designed to scan, read, extract, categorise, and organise meaningful information from large streams of data. Its primary function is to extract valuable information from extensive data sets without human input, thereby increasing processing speed and accuracy while reducing costs. By leveraging a combination of Artificial Intelligence (AI), Machine Learning (ML), Optical Character Recognition (OCR), and Natural Language Processing (NLP), IDP handles both structured and unstructured documents. By ensuring that the processed data meets the "gold standard" - structured, complete, and compliant - IDP helps organizations maintain high-quality, reliable, and actionable data. The Power of Knowledge Mining and Intelligent Document Processing as a Unified Solution Knowledge Mining excels at quickly responding to natural language queries, providing valuable insights and making previously unsearchable data accessible. At the same time, IDP ensures that the processed data meets the "gold standard"—structured, complete, and compliant—making it both reliable and actionable. Together, these technologies empower organisations to harness the full potential of their data, driving better decision-making and improved efficiency. __________________________________________________________________ Meet Alex: A Day in the Life of a Fraud Case Worker Responsibilities: Investigate potential fraud cases by manually searching across multiple systems. Read and analyse large volumes of information to filter out relevant data. Ensure compliance with regulatory requirements and maintain data accuracy. Prepare detailed reports on findings and recommendations. Lost in Data: The Struggles of Manual Fraud Investigation Alex receives a new fraud case and starts by manually searching through multiple systems to gather information. This process takes several hours, and Alex has to read through numerous documents and emails to filter out relevant data. The inconsistent data formats and locations make it challenging to ensure accuracy. By the end of the day, Alex is exhausted and has only made limited progress on the case. Effortless Efficiency: Fraud Investigation Transformed with Knowledge Mining and IDP Alex receives a new fraud case and needs to gather all relevant information quickly. Instead of manually searching through multiple systems, Alex inputs the following natural language query into the unified system: "Show me all documents, emails, and notes related to the recent transactions of client X that might indicate fraudulent activity." The system quickly retrieves and presents a comprehensive summary of all relevant documents, emails, and notes, ensuring that the data is structured, complete, and compliant. This allows Alex to focus on analysing the data and making informed decisions, significantly improving the efficiency and accuracy of the investigation. How has Knowledge Mining and IDP transformed Alex's role? Before implementing Knowledge Mining and Intelligent Document Processing, Alex faced a manual process of searching across multiple systems to gather information. This was time-consuming and labour-intensive, often leading to delays in investigations. The overwhelming volume of data from various sources made it difficult to filter out relevant information, and the inconsistent data formats and locations increased the risk of errors. This high workload not only reduced Alex's efficiency but also led to burnout and decreased job satisfaction. However, with the introduction of a unified system powered by Knowledge Mining and IDP, these challenges were significantly mitigated. Automated searches using natural language queries allowed Alex to quickly find relevant information, while IDP ensured that the data processed was structured, complete, and compliant. This unified system provided a comprehensive view of the data, enabling Alex to make more informed decisions and focus on higher-value tasks, ultimately improving productivity and job satisfaction. ____________________________________________________________________ Example Architecture Knowledge Mining Users can interact with the system through a portal on the customer’s front-end of choice. This will serve as the entry point for submitting queries and accessing the knowledge mining service. Front-end options could include web apps, container services or serverless integrations. Azure AI Search provides powerful RAG capabilities. Meanwhile, Azure Open AI provides access to large language models to summarise responses. These services combined will take the user’s query to search the knowledge base and return relevant information which can be augmented as required. Prompt engineering can provide customisation to how the data is returned. You define what the data sources your Azure AI Search will consume. This can be Azure storage services or other data repositories. Data that meets a pre-defined gold standard is queried by Azure AI Search and relevant data is returned to the user. Gold standard data could be based on compliance or business needs. Power BI can be used to create analytical reports based on the data retrieved and processed. This step involves visualising the data in an interactive and user-friendly manner, allowing users to gain insights and make data-driven decisions. Intelligent Document Processing (Optional) Azure Data Factory is a data integration service that allows you to create workflows for data movement and transforming data at scale. This business data can be easily ingested to your Azure data storage solutions using pre-built connectors. This event driven approach ensures that as new data is generated, it can automatically be processed and ready for use in your knowledge mining solution. Data can be transformed using Functions apps and Azure OpenAI. Through prompt engineering, the large language model (LLM) can highlight specific issues in the documents, such as grammatical errors, irrelevant content, or incomplete information. The LLM can then be used to rewrite text to improve clarity and accuracy, add missing information, or reformat content to adhere to guidelines. Transformed data is stored as gold standard data. ____________________________________________________________________ Additional Cloud Considerations Networking VNETs (Virtual Networks) are a fundamental component of cloud infrastructure that enable secure and isolated networking configurations within a cloud environment. They allow different resources, such as virtual machines, databases, and services, to communicate with each other securely. Virtual networks ensure that services such as Azure AI Search, Azure OpenAI, and Power BI, can securely communicate with each other. This is crucial for maintaining the integrity and confidentiality of sensitive financial data. Express Route or VPN are expected to be used when connecting on-premises infrastructure to Azure for several reasons. Your company Azure ExpressRoute provides a private, reliable, and high-speed connection between your data center and Microsoft Azure. It allows you to extend your infrastructure to Azure by providing private access to resources deployed in Azure Virtual Networks and public services like App service, private end points to various other services. This private peering ensures that your traffic never enters the public Internet, enhancing security and performance. ExpressRoute uses Border Gateway Protocol (BGP) for dynamic routing between your on-premises networks and Azure, ensuring efficient and secure data exchange. It also offers built-in redundancy and high availability, making it a robust solution for critical workloads. Azure Front Door is a cloud-based Content Delivery Network (CDN) and application delivery service provided by Microsoft. It offers several key features, including global load balancing, dynamic site acceleration, SSL offloading, and a web application firewall, making it an ideal solution for optimizing and protecting web applications. We are expecting to use Front door in scenarios when the architecture will be expected to be used by users outside the organisation. Azure API Management in this scenario is expected to be used when we look to rollout the solution to larger groups. We look to then integrate much more security, rate limiting, load balancing, etc. Monitoring and Governance Azure Monitor: This service collects and analyses telemetry data from various resources, providing insights into the performance and health of the system. It enables proactive identification and resolution of issues, ensuring the system runs smoothly. Azure Cost Management and Billing: Provides tools for monitoring and controlling costs associated with the solution. It offers insights into spending patterns and resource usage, enabling efficient financial governance. Application Insights: Provides application performance monitoring (APM) designed to help you understand how your applications are performing and to identify issues that may affect their performance and reliability These components together ensure that the Knowledge Mining and Intelligent Document Processing solution is monitored for performance, secured against threats, compliant with regulations, and managed efficiently from a cost perspective. ____________________________________________________________________ Next steps: Identify the data and its sources that will feed into your own Knowledge Mine. Consider if you also need to implement Intelligent Document Processing to ensure data quality. Define your 'gold standards'. These guidelines will determine how your data might be transformed. Consider how to provide access to the data through an application portal, choose the right front-end technology for your use case. Once you have configured Azure AI search to point to the chosen data, consider how you might augment responses using Azure AI LLM models. Useful resources AI Landing Zone reference architecture Azure and Open AI with API Manager Secure connectivity from on premesis to Azure hosted solutions279Views1like0CommentsKeep Your Azure Functions Up to Date: Identify Apps Running on Retired Versions
Running Azure Functions on retired language versions can lead to security risks, performance issues, and potential service disruptions. While Azure Functions Team notifies users about upcoming retirements through the portal, emails, and warnings, identifying affected Function Apps across multiple subscriptions can be challenging. To simplify this, we’ve provided Azure CLI scripts to help you: ✅ Identify all Function Apps using a specific runtime version ✅ Find apps running on unsupported or soon-to-be-retired versions ✅ Take proactive steps to upgrade and maintain a secure, supported environment Read on for the full set of Azure CLI scripts and instructions on how to upgrade your apps today! Why Upgrading Your Azure Functions Matters Azure Functions supports six different programming languages, with new stack versions being introduced and older ones retired regularly. Staying on a supported language version is critical to ensure: Continued access to support and security updates Avoidance of performance degradation and unexpected failures Compliance with best practices for cloud reliability Failure to upgrade can lead to security vulnerabilities, performance issues, and unsupported workloads that may eventually break. Azure's language support policy follows a structured deprecation timeline, which you can review here. How Will You Know When a Version Is Nearing its End-of-Life? The Azure Functions team communicates retirements well in advance through multiple channels: Azure Portal notifications Emails to subscription owners Warnings in client tools and Azure Portal UI when an app is running on a version that is either retired, or about to be retired in the next 6 months Official Azure Functions Supported Languages document here To help you track these changes, we recommend reviewing the language version support timelines in the Azure Functions Supported Languages document. However, identifying all affected apps across multiple subscriptions can be challenging. To simplify this process, I've built some Azure CLI scripts below that can help you list all impacted Function Apps in your environment. Linux* Function Apps with their language stack versions: az functionapp list --query "[?siteConfig.linuxFxVersion!=null && siteConfig.linuxFxVersion!=''].{Name:name, ResourceGroup:resourceGroup, OS:'Linux', LinuxFxVersion:siteConfig.linuxFxVersion}" --output table *Running on Elastic Premium and App Service Plans Linux* Function Apps on a specific language stack version: Ex: Node.js 18 az functionapp list --query "[?siteConfig.linuxFxVersion=='Node|18'].{Name:name, ResourceGroup:resourceGroup, OS: 'Linux', LinuxFxVersion:siteConfig.linuxFxVersion}" --output table *Running on Elastic Premium and App Service Plans Windows Function Apps only: az functionapp list --query "[?!contains(kind, 'linux')].{Name:name, ResourceGroup:resourceGroup, OS:'Windows'}" --output table Windows Function Apps with their language stack versions: az functionapp list --query "[?!contains(kind, 'linux')].{name: name, resourceGroup: resourceGroup}" -o json | ConvertFrom-Json | ForEach-Object { $appSettings = az functionapp config appsettings list -n $_.name -g $_.resourceGroup --query "[?name=='FUNCTIONS_WORKER_RUNTIME' || name=='WEBSITE_NODE_DEFAULT_VERSION']" -o json | ConvertFrom-Json $siteConfig = az functionapp config show -n $_.name -g $_.resourceGroup --query "{powerShellVersion: powerShellVersion, netFrameworkVersion: netFrameworkVersion, javaVersion: javaVersion}" -o json | ConvertFrom-Json $runtime = ($appSettings | Where-Object { $_.name -eq 'FUNCTIONS_WORKER_RUNTIME' }).value $version = switch($runtime) { 'node' { ($appSettings | Where-Object { $_.name -eq 'WEBSITE_NODE_DEFAULT_VERSION' }).value } 'powershell' { $siteConfig.powerShellVersion } 'dotnet' { $siteConfig.netFrameworkVersion } 'java' { $siteConfig.javaVersion } default { 'Unknown' } } [PSCustomObject]@{ Name = $_.name ResourceGroup = $_.resourceGroup OS = 'Windows' Runtime = $runtime Version = $version } } | Format-Table -AutoSize Windows Function Apps running on Node.js runtime: az functionapp list --query "[?!contains(kind, 'linux')].{name: name, resourceGroup: resourceGroup}" -o json | ConvertFrom-Json | ForEach-Object { $appSettings = az functionapp config appsettings list -n $_.name -g $_.resourceGroup --query "[?name=='FUNCTIONS_WORKER_RUNTIME' || name=='WEBSITE_NODE_DEFAULT_VERSION']" -o json | ConvertFrom-Json $runtime = ($appSettings | Where-Object { $_.name -eq 'FUNCTIONS_WORKER_RUNTIME' }).value if ($runtime -eq 'node') { $version = ($appSettings | Where-Object { $_.name -eq 'WEBSITE_NODE_DEFAULT_VERSION' }).value [PSCustomObject]@{ Name = $_.name ResourceGroup = $_.resourceGroup OS = 'Windows' Runtime = $runtime Version = $version } } } | Format-Table -AutoSize Windows Function Apps running on a specific language version: Ex: Node.js 18 az functionapp list --query "[?!contains(kind, 'linux')].{name: name, resourceGroup: resourceGroup}" -o json | ConvertFrom-Json | ForEach-Object { $appSettings = az functionapp config appsettings list -n $_.name -g $_.resourceGroup --query "[?name=='FUNCTIONS_WORKER_RUNTIME' || name=='WEBSITE_NODE_DEFAULT_VERSION']" -o json | ConvertFrom-Json $runtime = ($appSettings | Where-Object { $_.name -eq 'FUNCTIONS_WORKER_RUNTIME' }).value $nodeVersion = ($appSettings | Where-Object { $_.name -eq 'WEBSITE_NODE_DEFAULT_VERSION' }).value if ($runtime -eq 'node' -and $nodeVersion -eq '~18') { [PSCustomObject]@{ Name = $_.name ResourceGroup = $_.resourceGroup OS = 'Windows' Runtime = $runtime Version = $nodeVersion } } } | Format-Table -AutoSize All windows Apps running on unsupported language runtimes: (as of March 2025) az functionapp list --query "[?!contains(kind, 'linux')].{name: name, resourceGroup: resourceGroup}" -o json | ConvertFrom-Json | ForEach-Object { $appSettings = az functionapp config appsettings list -n $_.name -g $_.resourceGroup --query "[?name=='FUNCTIONS_WORKER_RUNTIME' || name=='WEBSITE_NODE_DEFAULT_VERSION']" -o json | ConvertFrom-Json $siteConfig = az functionapp config show -n $_.name -g $_.resourceGroup --query "{powerShellVersion: powerShellVersion, netFrameworkVersion: netFrameworkVersion}" -o json | ConvertFrom-Json $runtime = ($appSettings | Where-Object { $_.name -eq 'FUNCTIONS_WORKER_RUNTIME' }).value $version = switch($runtime) { 'node' { $nodeVer = ($appSettings | Where-Object { $_.name -eq 'WEBSITE_NODE_DEFAULT_VERSION' }).value if ([string]::IsNullOrEmpty($nodeVer)) { 'Unknown' } else { $nodeVer } } 'powershell' { $siteConfig.powerShellVersion } 'dotnet' { $siteConfig.netFrameworkVersion } default { 'Unknown' } } # Check if runtime version is unsupported $isUnsupported = switch($runtime) { 'node' { $ver = $version -replace '~','' [double]$ver -le 16 } 'powershell' { $ver = $version -replace '~','' [double]$ver -le 7.2 } 'dotnet' { $ver = $siteConfig.netFrameworkVersion $ver -notlike 'v7*' -and $ver -notlike 'v8*' } default { $false } } if ($isUnsupported) { [PSCustomObject]@{ Name = $_.name ResourceGroup = $_.resourceGroup OS = 'Windows' Runtime = $runtime Version = $version } } } | Format-Table -AutoSize Take Action Now By using these scripts, you can proactively identify and update Function Apps before they reach end-of-support status. Stay ahead of runtime retirements and ensure the reliability of your Function Apps. For step-by-step instructions to upgrade your Function Apps, check out the Azure Functions Language version upgrade guide. For more details on Azure Functions' language support lifecycle, visit the official documentation. Have any questions? Let us know in the comments below!2.3KViews1like2Comments