Our latest work to improve Azure Functions cold starts
Published Jun 10 2024 11:08 PM 7,730 Views
Microsoft

We continually work to improve performance and mitigate Azure Functions cold starts - the extra time it takes for a function that hasn’t been used recently to respond to an event. We understand that no matter when your functions were last called, you want fast executions and little lag time.

In this article:

  • How we measure cold start and the work done to improve it in the Azure Functions platform.
  • What you can do to optimize your functions to improve your app’s cold start performance.
  • Provide your feedback on Azure Functions cold start.

 

How we measure Azure Functions cold start

In measuring Azure Functions performance, we prioritize the cold start of synchronous HTTP triggers in the Consumption and Flex Consumption hosting plans. That means looking at what our platform and Azure Functions host need to do to execute the first HTTP trigger function on a new instance. Then we improve it. We are also working to improve cold start for asynchronous scenarios.

To assess our progress, we run sample HTTP trigger function apps that measure cold start latencies for all supported versions of Azure Functions, in all languages, for both Windows and Linux Consumption. These sample apps are deployed in all Azure regions and subregions where Azure Functions runs. Our test function calls these sample apps every few hours to trigger a true cold start and currently generates nearly 85,000 daily cold start samples. Through this testing infrastructure we observed in past 18 months a reduction on cold start latency by approximately 53 percent across all regions and for all supported languages and platforms.

If any of the tracked metrics start to regress, we’re immediately notified and start investigating. Daily emails, alerts, and historical dashboards tell us the end-to-end cold start latencies across various percentiles. We also perform specific analyses and trigger alerts if our fiftieth percentile, ninety-ninth percentile, or maximum latency numbers regress.

In addition, we collect detailed PerfView profiles of the sample apps deployed in select regions. The breakdown includes full call stacks (user mode and kernel mode) for every millisecond spent during cold start. The profiles reveal CPU usage and call stacks, context switches, disk reads, HTTP calls, memory hard faults, common language runtime (CLR) just-in-time (JIT) compiler, garbage collector (GC), type loads, and many more details about .NET internals. We report all these details in our logging pipelines and receive alerts if metrics regress. And we’re always looking for ways to make improvements based on these profiles.

 

Performance improvements in the platform

Since launching Azure Functions, we’ve improved performance across the Azure platform that it runs on, in order to achieve the observed reduction in cold starts. These enhancements extended to the shared platform with Azure App Service and the new Legion platform, the operating system, storage, .NET Core, and communication channels.

We aim to optimize for the ninety-ninth–percentile latency. We delve into cold start scenarios at the millisecond level and continually fine-tune the algorithms that allocate capacity. In short, we’re always working to improve Azure Functions cold start. The following areas are our current our focus:

  • Function app pools. In the internal architecture, we must ensure that the right number of Function app pools are warmed up and ready to handle a cold start for all supported platforms and languages. These pools serve as placeholders in effect. Exactly how many depends on the usage per region—plus enough extra capacity to meet unexpected bursts. We’re always refining our algorithms to balance the pools without increasing costs. Placeholder processes and dependencies stay hot in memory to prevent paging out.
  • Ninety-ninth–percentile latencies. Although it's relatively straightforward to optimize cold start scenarios for the fiftieth percentile, we are digging deeper to address ninety-ninth–percentile latencies, particularly when multiple VMs are involved. Each runs different processes and components and is configured with unique disk, network, and memory characteristics. It’s even harder to trace the root causes of potential ninety-ninth–percentile regressions.
  • Profilers. We use a multitude of specialized profiling tools capable of dissecting cold start scenarios at the millisecond level. We examine detailed call stacks and tracking activities at both the application and operating system levels. The PerfView and Event Tracing for Windows (ETW) providers are great at addressing issues with Windows and .NET-based apps, but we also investigate issues across platforms and languages. We also use Profile Guided Optimization (PGO) to ensure that Functions Host and dependent libraries are fully JIT compiled and ready to minimize the impact of platform code JIT compilation during actual cold start requests.
  • Histograms. If our platform detects cold starts occurring at regular intervals, we fully prewarm the instance where the function app will run to avoid cold start delays during actual execution.

 

6 things you can do now to improve cold start in Azure Functions

Here are a few strategies you can follow to further improve cold starts for your apps:

  1. Deploy your function as a .zip (compressed) package. Minimize its size by removing unneeded files and dependencies, such as debug symbols (.pdb files) and unnecessary image files.
  2. For Windows deployment, run your functions from a package file. To do this, set the WEBSITE_RUN_FROM_PACKAGE=1 app setting. If your app uses storage for storing content, deploy Azure Storage in the same region as your Azure Functions app and consider using premium storage for a faster cold start.
  3. When deploying .NET apps, publish with ReadyToRun to avoid additional costs from the JIT compiler.
  4. In the Azure portal, navigate to your function app. Go to Diagnose and solve problems, and review any messages that appear under Risk alerts. Look for issues that may impact cold starts.
  5. If your app uses a Premium or App Service plan, invoke warmup triggers to preload dependencies or to add any custom logic required to connect to external endpoints. This option isn’t supported for apps on Consumption plans.
  6. To help mitigate cold starts, try the always ready instances feature of our newest hosting option for event-driven serverless functions, Flex Consumption.

 

Final Thoughts

If your Azure Functions app still doesn’t perform as well as you’d like, consider the following:

Note: This article is a modified version of the article originally published on Newsstack.

 

 

Version history
Last update:
‎Jun 10 2024 04:55 PM
Updated by: