Azure Functions is Azure’s primary serverless service used in production by hundreds of thousands of customers who run trillions of executions on it monthly. It was first released in early 2016 and since then we have learnt a lot from our customers on what works and where they would like to see more.
Taking all this feedback into consideration, the Azure Functions team has worked hard to improve the experience across the stack from the initial getting started experience all the way to running at very high scale while at the same time adding features to help customers build AI apps. Please see this link for a list of all the capabilities we have released in this year’s Build conference. Taking everything into account, this is one of the most significant set of releases in Functions history.
In this blog post, I will share customer feedback and behind the scenes technical work that the Functions and other partner teams did to meet the expectations of our customers. In future, we will go deeper into each of those topics, this is a brief overview.
Flex Consumption: Burst scale your apps with networking support
We are releasing a new SKU of Functions, Flex Consumption. This SKU addresses a lot of the feedback that we have received over the years on the Functions Consumption plans - including faster scale, more instance sizes, VNET support, higher instance limits and much more. We have looked at each part of the stack and made improvements at all levels. There are many new capabilities including:
- Scales much faster than before with user controlled per-instance concurrency
- Scale to many more instances than before (upto 1000)
- Serverless “scale to zero” SKU that also supports VNET integrated event sources
- Supports always allocated workers
- Supports multiple memory sizes
Purpose built backend “Legion”
To enable Flex Consumption, we have created a brand-new purpose-built backend internally called Legion.
To host customer code, Legion relies on nested virtualization on Azure VMSS. This gives us the Hyper-V isolation that is a pre-requisite for hostile multi-tenant workloads. Legion was built right from the outset to support scaling to thousands of instances with VNET injection. Efficient use of subnet IP addresses by use of kernel level routing was also a unique achievement in Legion.
For all languages, functions have a strict goal for cold start. To achieve this cold start metric for all languages and versions, and to support functions image update for all these variants, we had to create a construct called Pool Groups that allows functions to specify all the parameters of the pool, as well as networking and upgrade policies.
All this work led us to a solid, scalable and fast infrastructure on which to build Flex Consumption on.
“Trigger Monitor” – scale to 0 and scale out with network restrictions
Flex Consumption also introduces networking features to limit access to the Function app and to be able to trigger on event sources which are network restricted. Since these event sources are network restricted the multi-tenant scaling component scale controller that monitors the rate of events to determine to scale out or scale in cannot access them. In the Elastic Premium plan in which we scale down to 1 instance – we solved this by that instance having access to the network restricted event source and then communicating scale decisions to the scale controller. However, in the Flex Consumption plan we wanted to scale down to 0 instances.
To solve this, we implemented a small scaling component we call “Trigger Monitor” that is injected into the customers VNET. This component is now able to access the network restricted event source. The scale controller now communicates with this component to get scaling decisions.
Scaling Http based apps based on concurrency
When scaling Http based workloads on Function apps our previous implementation used an internal heuristic to decide when to scale out. This heuristic was based on Front End servers,: pinging the workers that are currently running customers workload and deciding to scale based on the latency of the responses. This implementation used SQL Azure to track workers and assignments for these workers.
In Flex Consumption we have rewritten this logic where now scaling is based on user configured concurrency. User configured concurrency gives customers flexibility in deciding based on the language and workload what concurrency they want to set per instance. So, for example, for Python customers they don’t have to think about multithreading and can set concurrency =1 (which is also the default for Python apps). This approach makes the scaling behavior predictable, and it gives customers the ability to control the cost vs performance tradeoff – if they are willing to tolerate the potential for higher latency, they might unlock cost savings by running each worker at higher levels of concurrency.
In our implementation, we use "request slots" that are managed by the Data Role. We split instances into "request slots" and assign them to different Front End servers. For example: If the per-instance concurrency is set to 16, then once the Data Role chooses an instance to allocate a Function app to, there are 16 request slots that it can hand out to Front Ends. It might give all 16 to a single Front End, or share them across multiple. This removes the need for any coordination between Front Ends – they can use the request slots they receive as much as they like, with the restriction of only one concurrent request per request slot. Also, this implementation uses Cosmos DB to track assignments and workers.
Along with the Legion as the compute provider, significantly large compute allocation per app and rapid scale in and capacity reclamation allows us to give customers much better experience than before.
Scaling Non-Http based apps based on concurrency
Similar to Http apps, we have also enabled Non-Http based apps to scale based on concurrency. We refer to this as Target Based Scaling. . From an implementation perspective we have moved to have various extensions implement scaling logic within the extension and the scale controller hosts these extensions. This unifies the scaling logic in one place and unifies all scaling based on concurrency.
Moving configuration to the Control Plane
One more change that we are making directionally based on feedback from our customers is to move from using AppSettings for various configuration properties to moving them to the Control Plane. For Public Preview we are doing this for the areas of Deployment, Scaling, Language. This is an example configuration which shows the new Control Plane properties. By GA we will move other properties as well.
Azure Load Testing integration
Customers have always asked us how to configure their Function apps for optimum throughput. Till now we have just given them guidance to run performance tests on their own. Now they have another option, we are introducing native Integration with Azure Load Testing. A new performance optimizer is now available that enables you to decide the right configuration for your App by helping you to create and run tests by specifying different memory and Http concurrency configurations.
Functions on Azure Container Apps: Cloud-native microservices deployments
At Build we are also announcing GA of Functions running on Azure Container Apps. This new SKU allows customers to run their apps using the Azure Functions programming model and event driven triggers alongside other microservices or web applications co-located on the same environment. It allows a customer to leverage common networking resources and observability for all their applications. Furthermore, this allows Functions customers wanting to leverage frameworks (like Dapr) and compute options like GPU’s which are only available on Container Apps environments.
We had to keep this SKU consistent with other Function SKUs/plans, even though it ran and scaled on a different platform (Container Apps).
In particular,
- We created a new database for this SKU that can handle different schema needs (because of the differences in the underlying infra compared to regular Functions) and improved the query performance. We also redesigned some parts of the control plane for Functions on ACA.
- We used ARM extensions routing to securely route the traffic to host and enable Function Host APIs via ARM for Apps running inside an internal VNET
- We built a sync trigger service inside Azure Container Apps environment that detects Function App, reads trigger information from customer's functions code and automatically creates corresponding KEDA scaler rules for the Function App. This enables automatic scaling of Function Apps on Azure Container Apps (ACA), without customers having to know about the KEDA scaling platform involved.
- We developed a custom KEDA external scaler to support scale-to-zero scenario for Timer trigger functions.
VSCode.Web support: Develop your functions in the browser
The Azure Functions team values developer productivity and our VSCode integration and Core Tools are top-notch and one of the main advantages in experience over other similar products in this category. However, we are always striving to enhance this experience.
It is often challenging for developers to configure their local dev machine with the right pre-requisites before they can begin. This setup also needs to be updated with the new versions of local tools and language versions. On the other hand, GitHub codespaces and similar developer environments have demonstrated that we can have effective development environments hosted in the cloud.
We are launching a new getting started experience using VSCode for the Web for Azure Functions. This experience allows developers to write, debug, test and deploy their function code directly from their browser using VS Code for the Web, which is connected to a container-based-compute. This is the same exact experience that a developer would have locally. This container comes ready with all the required dependencies and supports the rich features offered by VS Code, including extensions. This experience can also be used for function apps that already have code deployed to them as well.
To build this functionality we built an extension that launches VS Code for the Web, a lightweight VS Code that runs in a user's browser. This VS Code client will communicate with Azure Functions backend infrastructure t to establish a connection to a VS Code server using a Dev Tunnel. With the VS Code client and server connected via a DevTunnel, the user will be able to edit their function as desired.
Open AI extension to build AI apps effortlessly
Azure Functions aims to simplify the development of different types of apps, such as web apps, data pipelines and other related work loads. AI apps is a clear new domain. Azure Functions has a rich extensibility model helping developers abstract away many of the mundane tasks that are required for integration along with making the capability be available for all the languages that Functions support.
We are releasing an extension on top of OpenAI which enables the following scenarios in just a few lines of code:
- Retrieval Augmented Generation (Bring your own data)
- Text completion and Chat Completion
- Assistants’ capability
Key here is that developers can build AI apps in any language of their choice that is supported by Functions and are hosted in a service that can be used within minutes.
Have a look at the following code snippet in C# where in a few lines of code:
This HTTP trigger function takes a query prompt as input, pulls in semantically similar document chunks into a prompt, and then sends the combined prompt to OpenAI. The results are then made available to the function, which simply returns that chat response to the caller.
public class SemanticSearchRequest
{
[JsonPropertyName("Prompt")]
public string? Prompt { get; set; }
}
[Function("PromptFile")]
public static IActionResult PromptFile(
[HttpTrigger(AuthorizationLevel.Function, "post")] SemanticSearchRequest unused,
[SemanticSearchInput("AISearchEndpoint", "openai-index", Query = "{Prompt}", ChatModel = "%CHAT_MODEL_DEPLOYMENT_NAME%", EmbeddingsModel = "%EMBEDDING_MODEL_DEPLOYMENT_NAME%")] SemanticSearchContext result)
{
return new ContentResult { Content = result.Response, ContentType = "text/plain" };
}
The challenges of building an extension are making sure that it hides enough of “glue code” and at the same time give enough flexibility to the developer for their business use case.
Furthermore, these were some additional challenges we faced:
- To save state across invocations in the chat completion scenarios we experimented with various implementations including Durable Functions and finally we move to using Table storage for preserving state during conversations.
- We had to figure out which embeddings store we should pursue support – we currently support Azure AI Search, Cosmos DB and Azure Data Explorer
- Like any technology that is moving fast we had to figure out the right strategy to use the underlying Open AI models and SDKS.
Streaming support in Node and Python
Another long asked for support that was added at Build is streaming support in Node (GA) and Python (preview)
With this feature, customers can stream HTTP requests to and responses from their Function Apps, using function exposed request and response APIs. Previously with HTTP requests, the amount of data that could be transmitted was limited to the SKU instance memory size. With HTTP streaming, large amounts of data can be processed with chunking. Especially relevant today is that this feature enables new scenarios when creating AI apps including processing large data streaming OpenAI responses and delivering dynamic content.
The journey to enable streaming support is interesting. It started with us first aiming for parity between in-proc and isolated models for .NET. To achieve this we implemented a new Http pipeline where-in the Http request would be proxied from the Functions Host onto the isolated worker. We were able to piggyback on the same technology to build streaming support in other out-of-proc languages.
OpenTelemetry support
In Build we are releasing support for OpenTelemetry in Functions. This allows customers to export telemetry data from both the Functions Host and from the language workers using OpenTelemetry semantics. These are some of the interesting design directions we took for this work:
- The customer's code ignores the Functions host and re-creates the context in each language worker for a smooth experience.
- Telemetry is the same for ApplicationInsights and other vendors; customers get the same telemetry data no matter what they use. LiveLogs works with AI, but the overall experience doesn't change.
- To make things easier for our customers, each language worker has a package/module that removes extra code.
Thank you and going forward
Thank you to all the customers and developers who have used Azure Functions through the years. We would love for you to try out these new features and capabilities and provide feedback and suggestions.
Going forward we will be working on:
- Getting Flex Consumption to GA to enable more scale for our most demanding customers and keep making improvements in the meanwhile.
- Continue to keep enhancing the Open AI extension with more scenarios and models to make Azure Functions the easiest and fastest way to create an AI service.
- Continue to enhance our getting started experience and take VSCode.Web integration to more languages and to GA.
- Adding support for Streaming to other languages including Java.