azure load testing
68 TopicsScaling Azure Functions Python with orjson
Azure Functions now supports ORJSON in the Python worker, giving developers an easy way to boost performance by simply adding the library to their environment. Benchmarks show that ORJSON delivers measurable gains in throughput and latency, with the biggest improvements on small–medium payloads common in real-world workloads. In tests, ORJSON improved throughput by up to 6% on 35 KB payloads and significantly reduced response times under load, while also eliminating dropped requests in high-throughput scenarios. With its Rust-based speed, standards compliance, and drop-in adoption, ORJSON offers a straightforward path to faster, more scalable Python Functions without any code changes.165Views0likes0CommentsRunning a Load Test within a Chaos Experiment
With Azure Chaos Studio and Azure Load Testing, you can simulate both — run a controlled load test while injecting faults into your application or infrastructure to understand how it behaves under stress. Together, they help you find those resiliency blind spots — the cascading failures, retry storms, and degraded dependencies that only appear when your system is both busy and broken. For example: What if your database becomes read-only during peak user traffic? How does your API react if a downstream service starts returning 500s? Can your autoscaling rules recover fast enough? Let’s explore how you can run load tests from Azure Load Testing as part of a chaos experiment. Azure Chaos Studio + Azure Load Testing Integration Azure Chaos Studio has load test actions that let you integrate load testing directly into your chaos experiment flow. From the Chaos Studio fault library, you can find: Start load test (Azure Load Testing) Stop load test (Azure Load Testing) Triggers a load test from your Azure Load Testing resource as part of an experiment step. This means you can now orchestrate a sequence like: Start load test Inject a fault (e.g., shut down VM, throttle network, restart App Service) Observe and measure resiliency Stop the test and analyze metrics Chaos Experiment with Load Test Action Here’s how a typical experiment might look conceptually: Step 1. Define the experiment in Chaos Studio Create a new experiment that targets your application or infrastructure components — for example, an App Service or a SQL Database. Add the Start Load Test (Azure Load Testing) action: This tells Chaos Studio to kick off a load test from Azure Load Testing. Step 2. Add faults to simulate real-world failures You can follow up the load test action with a fault like: CPU pressure on your VM or container Network latency or packet loss injection Service shutdown of a dependent component Step 3. Observe and analyze Once the experiment runs, you can: View load test metrics (like response times, error rates, throughput) in Azure Load Testing View fault outcomes in Chaos Studio Correlate both using Application Insights or Log Analytics This gives a holistic view of performance and resiliency under controlled failure. By combining load and chaos, you can answer: How does latency or failure in one microservice affect end-to-end response times? Do retry policies or circuit breakers behave as expected under load? Does the system self-heal once the fault is removed? What’s the performance impact of failover mechanisms? Conclusion Chaos testing under load helps teams move from confidence to certainty. Azure’s native integration between Chaos Studio and Load Testing makes it easier than ever to build resiliency testing into your CI/CD pipeline — using only Azure-native services. Learn More Azure Chaos Studio documentation Azure Load Testing documentation103Views0likes0CommentsAzure Load Test Pricing
H , This is regarding Azure load testing pricing. Please advice. Virtual User Hour (VUH) usage 0 - 10,000 Virtual User Hours - $0.15/VUH10,000+ Virtual User Hours - $0.06/VUH Virtual User Hour (VUH) usage 0 - 10,000 Virtual User Hours - $0.15/VUH10,000+ Virtual User Hours - $0.06/VUH I am trying to understand above pricing. Lets say i want to run a test with 10k users just to login my website. This will max take 10 sec to complete. How will the pricing gets calculated? Regards, Sharukh115Views0likes3CommentsThroughput Testing at Scale for Azure Functions
Introduction Ensuring reliable, high-performance serverless applications is central to our work on Azure Functions. With new plans like Flex Consumption expanding the platform’s capabilities, it's critical to continuously validate that our infrastructure can scale—reliably and efficiently—under real-world load. To meet that need, we built PerfBench (Performance Benchmarker), a comprehensive benchmarking system designed to measure, monitor, and maintain our performance baselines—catching regressions before they impact customers. This infrastructure now runs close to 5,000 test executions every month, spanning multiple SKUs, regions, runtimes, and workloads—with Flex Consumption accounting for more than half of the total volume. This scale of testing helps us not only identify regressions early, but also understand system behavior over time across an increasingly diverse set of scenarios. of all Python Function apps across regions (SKU: Flex Consumption, Instance Size: 2048 – 1000 VUs over 5 mins, HTML Parsing test) Motivation: Why We Built PerfBench The Need for Scale Azure Functions supports a range of triggers, from HTTP requests to event-driven flows like Service Bus or Storage Queue messages. With an ever-growing set of runtimes (e.g., .NET, Node.js, Python, Java, PowerShell) and versions (like Python 3.11 or .NET 8.0), multiple SKUs and regions, the possible test combinations explode quickly. Manual testing or single-scenario benchmarks no longer cut it. The current scope of coverage tests. Plan PricingTier DistinctTestName FlexConsumption FLEX2048 110 FlexConsumption FLEX512 20 Consumption CNS 36 App Service Plan P1V3 32 Functions Premium EP1 46 Table 1: Different test combinations per plan based on Stack, Pricing Tier, Scenario, etc. This doesn’t include the ServiceBus tests. The Flex Consumption Plan There have been many iterations of this infrastructure within the team, and we’ve been continuously monitoring the Functions performance for more than 4 years now - with more than a million runs till now. But with the introduction of the Flex Consumption plan (Preview at the time of building PerfBench), we had to redesign the testing from ground up, as Flex Consumption unlocks new scaling behaviors and needed thorough testing—millions of messages or tens of thousands of requests per second—to ensure confidence in performance goals and regressions prevention. Consumption, Instance Size: 2048) PerfBench: High-Level Architecture Overview PerfBench is composed of several key pieces: Resource Creator – Uses meta files and Bicep templates to deploy receiver function apps (test targets) at scale. Test Infra Generator – Deploys and configures the system that actually does the load generation (e.g., SBLoadGen function app, Scheduler function app, ALT webhook function). Test Infra – The “brain” of testing, including the Scheduler, Azure Load Testing integration, and SBLoadGen. Receiver Function Apps – Deployed once per combination of runtime, version, region, OS, SKU, and scenario. Data Aggregation & Dashboards – Gathers test metrics from Azure Load Testing (ALT) or SBLoadGen, stores them in Azure Data Explorer (ADX), and displays trends in ADX dashboards. Below is a simplified architecture diagram illustrating these components: Components Resource Creator The resource creator uses meta files and Jinja templates to generate Bicep templates for creating resources. Meta Files: We define test scenarios in simple text-based files (e.g., os.txt, runtime_version.txt, sku.txt, scenario.txt). Each file lists possible values (like python|3.11 or dotnet|8.0) and short codes for resource naming. Template Generation: A script reads these meta files and uses them to produce Bicep templates—one template per valid combination—deploying receiver function apps into dedicated resource groups. Filters: Regex-like patterns in a filter.txt file exclude unwanted combos, keeping the matrix manageable. CI/CD Flow: Whenever we add a new runtime or region, a pull request updates the relevant meta file. Once merged, our pipeline regenerates Bicep and redeploys resources (these are idempotent updates). Test Infra Generator Deploys and configures the Scheduler Function App, SBLoadGen Durable Functions app, and the ALT webhook function. Similar CI/CD approach—merging changes triggers the creation (or update) of these infrastructure components. Test Infra: Load Generation, Scheduling, and Reporting Scheduler The conductor of the whole operation that runs every 5 minutes to load test configurations ( test_configs.json) from Blob Storage. The configuration includes details on what tests to run, at what time (e.g., “run at 13:45 daily”), and references to either ALT for HTTP or SBLoadGen for non-HTTP tests - to schedule them using different systems. Some tests run multiple times daily, others once a day; a scheduled downtime is built in for maintenance. HTTP Load Generator - Azure Load Testing (ALT) We utilize Azure Functions to trigger Azure Load Tests (ALT) for HTTP-based scenarios. ALT is a production-grade load generator tool that provides an easy to configure way to send load to different server endpoints using JMeter and Locust. We worked closely with the ALT team to optimize the JMeter scripts for different scenarios and it recently completed second year. We created an abstraction on top of ALT to create a webhook-approach of starting tests as well as get notified when tests finish, and this was done using a custom function app that does the following: Initiate a test run using a predefined JMX file. Continuously poll until the test execution is complete. Retrieve the test results and transform them into the required format. Transmit the formatted results to the data aggregation system. Sample ALT Test Run: 8.8 million requests in under 6 minutes, with a 90th percentile response time of 80ms and zero errors. The system maintained a throughput of 28K+ RPS. Some more details that we did within ALT - 25 Runtime Controllers manage the test logic and concurrency. 40 Engines handle actual load execution, distributing test plans. 1,000 Clients total for 5-minute runs to measure throughput, error rates, and latency. Test Types: HelloWorld (GET request, to understand baseline of the system). HtmlParser (POST request sending HTML for parsing to simulate moderate CPU usage). Service Bus Load Generator - SBLoadGen (Durable Functions) For event-driven scenarios (e.g., Service Bus–based triggers), we built SBLoadGen. It’s a Durable Function that uses the fan-out pattern to distribute work across multiple workers—each responsible for sending a portion of the total load. In a typical run, we aim to generate around one million messages in under a minute to stress-test the system. We intentionally avoid a fan-in step—once messages are in-flight, the system defers to the receiver function apps to process and emit relevant telemetry. Highlights: Generates ~1 million messages in under a minute. Durable Function apps are deployed regionally and are triggered via webhook. Implemented as a Python Function App using Model V2. Note: This would be open sourced in the coming days. Receiver Function Apps (Test apps) These are the actual apps receiving all the load generated. They are deployed with different combinations and updated rarely. Each valid combination (region + OS + runtime + SKU + scenario) gets its own function app, receiving load from ALT or SBLoadGen. HTTP Scenarios: HelloWorld: No-op test to measure overhead of the system and baseline. HTML Parser: POST with an HTML document for parsing (Simulating small CPU load). Non-HTTP (Service Bus) Scenario: CSV-to-JSON plus blob storage operations, blending compute and I/O overhead. Collected Metrics: RPS: Requests per second (RPS), success/error rates, latency distributions for HTTP workloads. MPPS: Messages processed per second (MPPS), success/error rates for non-HTTP (e.g. Service Bus) workloads. Data Aggregation & Dashboards Capturing results at scale is just as important as generating load. PerfBenchV2 uses a modular data pipeline to reliably ingest and visualize metrics from both HTTP and Service Bus–based tests. All test results flow through Event Hubs, which act as an intermediary between the test infrastructure and our analytics platform. The webhook function (used with ALT) and the SBLoadGen app both emit structured logs that are routed through Event Hub streams and ingested into dedicated Azure Data Explorer (ADX) tables. We use three main tables in ADX: HTTPTestResults for test runs executed via Azure Load Testing. SBLoadGenRuns for recording message counts and timing data from Service Bus scenarios. SchedulerRuns to log when and how each test was initiated. On top of this telemetry, we’ve built custom ADX dashboards that allow us to monitor trends in latency, throughput, and error rates over time. These dashboards provide clear, actionable views into system behavior across dozens of runtimes, regions, and SKUs. Because our focus is on long-term trend analysis, rather than real-time anomaly detection, this batch-oriented approach works well and reduces operational complexity. CI/CD Pipeline Integration Continuous Updates: Once a new language version or scenario is added to runtime_version.txt or scenario.txt meta files, the pipeline regenerates Bicep and deploys new receiver apps. The Test Infra Generator also updates or redeploys the needed function apps (Scheduler, SBLoadGen, or ALT webhook) whenever logic changes. Release Confidence: We run throughput tests on these new apps early and often, catching any performance regressions before shipping to customers. Challenges & Lessons Learned Designing and running this infrastructure hasn't been easy and we've learned a lot of valuable lessons on the way. Here are few Exploding Matrix - Handling every runtime, OS, SKU, region, scenario can lead to thousands of permutations. Meta files and a robust filter system help keep this under control, but it remains an ongoing effort. Cloud Transience - With ephemeral infrastructure, sometimes tests fail due to network hiccups or short-lived capacity constraints. We built in retries and redundancy to mitigate transient failures. Early Adoption - PerfBench was among the first heavy “customers” of the new Flex Consumption plan. At times, we had to wait for Bicep features or platform fixes—but it gave us great insight into the plan’s real-world performance. Maintenance & Cleanup - When certain stacks or SKUs near end-of-life, we have to decommission their resources—this also means regular grooming of meta files and filter rules. Success Stories Proactive Regression Detection: PerfBench surfaced critical performance regressions early—often before they could impact customers. These insights enabled timely fixes and gave us confidence to move forward with the General Availability of Flex Consumption. Production-Level Confidence: By continuously running tests across live production regions, PerfBench provided a realistic view of system behavior under load. This allowed the team to fine-tune performance, eliminate bottlenecks, and achieve improvements measured in single-digit milliseconds. Influencing Product Evolution: As one of the first large-scale internal adopters of the Flex Consumption plan, PerfBench served as a rigorous validation tool. The feedback it generated played a direct role in shaping feature priorities and improving platform reliability—well before broader customer adoption. Future Directions Open sourcing: We are in the process of open sourcing all the relevant parts of PerfBench - SBLoadGen, BicepTemplates generator, etc. Production Synthetic Validation and Alerting: Adapting PerfBench’s resource generation approach for ongoing synthetic tests in production, ensuring real environments consistently meet performance SLOs. This will also open up alerting and monitoring scenarios across production fleet. Expanding Trigger Coverage and Variations: Exploring additional triggers like Storage queues or Event Hub triggers to broaden test coverage. Testing different settings within the same scenario (e.g., larger payloads, concurrency changes). Conclusion PerfBench underscores our commitment to high-performance Azure Functions. By automating test app creation (via meta files and Bicep), orchestrating load (via ALT and SBLoadGen), and collecting data in ADX, we maintain a continuous pulse on throughput. This approach has already proven invaluable for Flex Consumption, and we’re excited to expand scenarios and triggers in the future. For more details on Flex Consumption and other hosting plans, check out the Azure Functions Documentation. We hope the insights shared here spark ideas for your own large-scale performance testing needs — whether on Azure Functions or any other distributed cloud services. Acknowledgements We’d like to acknowledge the entire Functions Platform and Tooling teams for their foundational work in enabling this testing infrastructure. Special thanks to the Azure Load Testing (ALT) team for their continued support and collaboration. And finally, sincere appreciation to our leadership for making performance a first-class engineering priority across the stack. Further Reading Azure Functions Azure Functions Flex Consumption Plan Azure Durable Funtions Azure Functions Python Developer Reference Guide Azure Functions Performance Optimizer Example case study: Github and Azure Functions Azure Load Testing Overview Azure Data Explorer Dashboards If you have any questions or want to share your own performance testing experiences, feel free to reach out in the comments!977Views0likes0CommentsIntroducing AI-Powered Actionable Insights in Azure Load Testing
We’re excited to announce the preview of AI powered Actionable Insights in Azure Load Testing—a new capability that helps teams quickly identify performance issues and understand test results through AI-driven analysis. Performance testing is an essential part of ensuring application reliability and responsiveness, but interpreting the results can often be challenging. It typically involves manually correlating client-side load test telemetry with backend service metrics, which can be both time-consuming and error-prone. Actionable Insights simplifies this process by automatically analyzing test data, surfacing key issues, and offering clear, actionable recommendations—so teams can focus on fixing what matters, not sifting through raw data. AI-powered diagnostics Actionable Insights uses AI to detect performance issues such as latency spikes, failed requests, throughput anomalies, and resource bottlenecks. It presents insights clearly, highlighting patterns and root causes so teams can quickly understand what went wrong and how to fix it. Insights leverage telemetry from both client-side metrics and server-side metrics which is collected via Azure Monitor. When server-side monitoring is enabled, Azure Load Testing correlates frontend traffic patterns with backend system behavior. For example, if an increase in virtual users coincides with latency spikes in Azure Cosmos DB, the insight will highlight this relationship and suggest corrective actions—giving teams a comprehensive view of system behavior under load. You can learn how to enable server-side metrics here. Rich, integrated experience for faster issue resolution Actionable Insights provides a unified, intuitive experience within your test results, clearly illustrating the context of detected performance issues. By consolidating metrics, conditions, and recommendations into a single view, your team can diagnose and resolve issues faster, without switching tools or piecing data together manually. Get Started Actionable Insights is now available in preview. To try it out, trigger a new test run in Azure Load Testing. For best results, enable server-side metrics when configuring your test. Once the run completes, AI-powered insights will be available in the test results view—no additional setup required. This is just the beginning. We are actively working on improving the quality of these insights and adding more capabilities to it. Your feedback is essential. Let us know what’s working well and where we can improve by using the thumbs-up or thumbs-down option on each generated insight in the Azure Load Testing portal. You can also share your feedback on our community. Learn more about Actionable Insights810Views4likes0CommentsOptimize Azure Functions for Performance and Costs using Azure Load Testing
Performance optimizer is a tool that helps you find the optimal balance between cost and performance for your Azure Functions. It runs load tests on different configurations and recommends the best one for your app.6.8KViews3likes1CommentAnnouncing CI/CD Enhancements for Azure Load Testing
We are excited to announce a significant update to our Azure Load Testing service, aimed at enhancing the experience of setting up and running load tests from CI/CD systems, including Azure DevOps and GitHub. This update is a direct response to customer feedback and is designed to streamline the process, making it more efficient and user-friendly. Key Features and Improvements: Enhanced CI/CD Integration: Developers and testers can now configure application components and the metrics to monitor directly from a CI/CD pipeline. This integration allows monitoring the application infrastructure during the test run. You can make the following changes to your load test YAML config. appComponents: - resourceId: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/samplerg/providers/microsoft.insights/components/appComponentResource" resourceName: appComponentResource #Optional kind: web # Optional metrics: - name: "requests/duration" namespace: microsoft.insights/components aggregation: "Average" - name: "requests/count" aggregation: "Total" namespace: microsoft.insights/components - resourceId: "/subscriptions/00000000-0000-0000-0000-000000000000/resourceGroups/samplerg/providers/microsoft.insights/components/appComponentResource" resourceName: appComponentResource #Optional kind: web # Optional metrics: - name: "requests/duration" aggregation: "Average" namespace: microsoft.insights/components - name: "requests/count" aggregation: "Total" namespace: microsoft.insights/components Pass/Fail Criteria on Server Metrics: Users can set pass/fail criteria on server metrics from a CI/CD pipeline, providing more granular control over test outcomes. This feature helps in maintaining high performance standards by automatically flagging any performance issues. You can make the following changes to your load test YAML config. failureCriteria: clientMetrics: - avg(responseTimeMs) > 300 - percentage(error) > 50 - getCustomerDetails: avg(latency) > 200 serverMetrics: - resourceId: /subscriptions/abcdef01-2345-6789-0abc-def012345678/resourceGroups/sample-rg/providers/Microsoft.Compute/virtualMachines/sample-vm metricNamespace: Microsoft.Compute/virtualMachines metricName: Percentage CPU aggregation: Average condition: GreaterThan value: 80 - resourceId: /subscriptions/abcdef01-2345-6789-0abc-def012345678/resourceGroups/sample-rg/providers/Microsoft.Compute/virtualMachines/sample-vm metricNamespace: Microsoft.Compute/virtualMachines metricName: Available Memory aggregation: Average condition: LessThan value: 20 Parameter Overrides: The ability to override parameters of a load test configuration YAML from the Azure DevOps task or GitHub action adds flexibility and customization to the testing process. Output Variables: The Azure DevOps task now includes output variables that can be consumed in downstream steps, jobs, and stages. This feature to take further actions on the load test results within the pipeline. Pipeline Cancellation: If a pipeline in Azure Pipelines or a workflow in GitHub is cancelled, any load test triggered by the pipeline/action will also be cancelled. This ensures avoiding costs for unnecessary tests. Traceability and Results Viewing: Users can trace a test run back to the pipeline that ran the test from Azure portal. This helps in end-to-end traceability to understand what changes might have triggered the test failure. Conclusion These enhancements are designed to provide a more integrated and efficient load testing experience for our users. We believe that these updates will help developers, testers, and DevOps engineers to better manage their load testing processes, ensuring high performance and reliability of their applications. We look forward to your feedback and are excited to see how these new features will improve your CI/CD workflows. Stay tuned for more updates and happy testing!461Views1like0Comments