In May, dynamic concurrency(denoted as DC in this blog) in Azure Functions became generally available. By enabling dynamic concurrency, the platform can adjust the concurrency of functions dynamically in the condition that the worker instance is healthy(like CPU and thread utilization is healthy). You can refer to more information about dynamic concurrency in this link: https://docs.microsoft.com/en-us/azure/azure-functions/functions-concurrency#dynamic-concurrency.
In this blog, I am writing to introduce how DC works in Azure function apps with some tests.
How to enable Dynamic Concurrency?
By default, dynamic concurrency is disabled. With dynamic concurrency enabled, concurrency starts at 1 for each function, and is adjusted up to an optimal value, which is determined by the host. Here, the concurrency value represents the concurrency value in one instance.
You can enable dynamic concurrency in your function app by adding the following settings in your host.json file:
{
"version": "2.0",
"concurrency": {
"dynamicConcurrencyEnabled": true,
"snapshotPersistenceEnabled": true
}
}
When SnapshotPersistenceEnabled is true, which is the default, the learned concurrency values are periodically persisted to storage so new instances start from those values instead of starting from 1 and having to redo the learning.
What kind of languages and triggers does DC support?
Dynamic concurrency supports all languages that function apps support.
Dynamic concurrency is currently only supported for the Azure Blob, Azure Queue, and Service Bus triggers. However, it has requirements for extensions versions.
- It requires you to use version 5.x of the storage extension, and version 5.x of the Service Bus extension in language dotnet assembly.
- When using languages like node.js, python, it requires you to use bundle extension at least version 3.3.0. You can set host version as "[3.3.0, 4.0.0)" in host.json.
How to see Dynamic Concurrency’s adjusting logs?
You need to set the logLevel of ‘Host.Concurrency’ to ‘Trace’ in host.json to enable the logging of dynamic concurrency.
{
"version": "2.0",
"logging": {
"logLevel": {
"Host.Concurrency": "Trace"
}
},
"concurrency": {
"dynamicConcurrencyEnabled": true,
"snapshotPersistenceEnabled": true
}
}
Then you can see the dynamic concurrency logs in filesystem log or applicationinsight log like below. In the below example, the platform saw the CPU load was low, then it decided to increase the concurrency value of function ‘SBQueueFunction1’ to 138.
2022-06-13T13:19:17.078 [Debug] [HostMonitor] Host process CPU stats (PID 6576): History=(18,48,29,39,37), AvgCpuLoad=34, MaxCpuLoad=48
2022-06-13T13:19:17.078 [Debug] [HostMonitor] Host aggregate CPU load 34
2022-06-13T13:19:17.078 [Debug] FunctionApp7.SBQueueFunction1.Run Increasing concurrency
2022-06-13T13:19:17.078 [Debug] FunctionApp7.SBQueueFunction1.Run Concurrency: 138, OutstandingInvocations: 135
Where is concurrency value stored?
The concurrency value of each function is stored in the storage account specified in appsetting ‘AzureWebjobsStorage’. The values are stored in file ‘azure-webjobs-hosts / concurrency / functionApp_Name / concurrencyStatus.json’.
The concurrency value is based on each instance. The below file means that from the timestamp to now, the concurrency value of function ‘SBQueueFunction1’ is 171, and the concurrency value of ‘BlobFunction1’is 90. In other words, in each instance of the function app, the function ‘SBQueueFunction1’ can be executed 171 times simultaneously, and the function ‘BlobFunction1’ can be executed 90 times simultaneously.
{"Timestamp":"2022-06-13T13:19:55.6526064Z","NumberOfCores":1,"FunctionSnapshots":{"FunctionApp7.BlobFunction1.Run":{"Concurrency":90},"FunctionApp7.SBQueueFunction1.Run":{"Concurrency":171}}}
If you set ‘snapshotPersistenceEnabled’ to true in host.json, the platform will read the current concurrency value from this file. And when the platform decides to increase or decrease the value, it will write back to this file to change the value.
Test of the Dynamic concurrency:
Testing environment:
- Azure function app in B1 tier with only 1 instance. I wrote C# codes from local VS then published to the function app.
- To avoid noise, in the app service plan I only have this function app and only 1 testing function.
- I have a simple service bus queue trigger in my function app, the codes are like below. It just consumes some CPU then wait 1.5 seconds before return.
public class SBQueueFunction1
{
[FunctionName("SBQueueFunction1")]
public async Task Run([ServiceBusTrigger("queue2", Connection = "sbconnection")] string myQueueItem, ILogger log)
{
log.LogInformation($"C# ServiceBus queue trigger function processed message: {myQueueItem}");
double a = 333.33;
double b = 444.44;
double c = 0;
for (int i = 0; i < 10000; i++)
{
c = a * b;
a = a + 0.1;
}
await Task.Delay(1500);
}
}
4. Each time, I sent a batch of 4000 messages to the service bus queue to let the function to be triggered 4000 times.(Why 4000? Because the maximum size of my service bus batch is 4500) If you don’t know how to send messages to the service bus queue, you can refer to: https://docs.microsoft.com/en-us/azure/service-bus-messaging/service-bus-dotnet-get-started-with-queues.
Tests:
Test scenario 1:
I set ‘snapshotPersistenceEnabled’ to false in host.json.
Finding:
In the function log I saw the original concurrency value was 1, and it was increased many times. When the 4000 executions finished, the concurrency was 137. In total, the 4000 executions took 118 seconds to finish all.
Test scenario 2:
I set ‘snapshotPersistenceEnabled’ to true in host.json.
Finding:
This test result is almost same to test 1. The concurrency value was increased from 1 to 137.
Test scenario 3:
Set ‘snapshotPersistenceEnabled’ to true in host.json. I sent many batches(each has 4000 messages) to the service bus queue in many rounds constantly. And I recorded the change of concurrency value and time taken of each round.
Finding:
In this test, the records as below. It looks that the total time was decreasing in each round and the concurrency value converged to 311.
|
Starting value of concurrency |
Ending value of concurrency |
Time taken(seconds) |
Round 1 |
1 |
137 |
118 |
Round 2 |
137 |
171 |
48 |
Round 3 |
171 |
211 |
40 |
Round 4 |
211 |
233 |
33 |
Round 5 |
233 |
267 |
31 |
Round 6 |
267 |
289 |
28 |
Round 7 |
289 |
305 |
27 |
Round 8 |
305 |
311(max value 312) |
27 |
Round 9 |
311 |
311(no change) |
26 |
Test scenario 4:
I set the original concurrency value to 500 in ‘azure-webjobs-hosts / concurrency / functionApp_Name / concurrencyStatus.json’ while ‘snapshotPersistenceEnabled’ is true in host.json. Then I sent batches of 4000 messages to the service bus queue continuously(sent over 50000 messages in total). Also observe the concurrency value and time taken.
Finding:
The concurrency value decreased due to high CPU or thread starvation as below. Also the total time of 4000 executions was always longer than 26 seconds. At last, the concurrency value also converged to around 320.
2022-06-13T15:52:12.598 [Debug] [HostMonitor] Host process CPU stats (PID 5760): History=(84,88,89,89,89), AvgCpuLoad=88, MaxCpuLoad=89
2022-06-13T15:52:12.598 [Debug] [HostMonitor] Host aggregate CPU load 88
2022-06-13T15:52:12.598 [Warning] [HostMonitor] Host CPU threshold exceeded (88 >= 80)
2022-06-13T15:52:12.598 [Warning] Possible thread pool starvation detected.
2022-06-13T15:52:12.599 [Debug] FunctionApp7.SBQueueFunction1.Run Decreasing concurrency (Enabled throttles: CPU,ThreadPoolStarvation)
2022-06-13T15:52:12.599 [Debug] FunctionApp7.SBQueueFunction1.Run Concurrency: 427, OutstandingInvocations: 429
Conclusions:
- While enabling dynamic concurrency, the concurrency process manager will adjust the concurrency value gradually by monitoring instance health metrics, like CPU and thread utilization, and changes throttles as needed.
- If the instance’s all metrics are healthy, when the concurrency value is small, the platform will increase this value. And when the instance is not healthy, the concurrency manager will decrease the concurrency value. The concurrency value will converge to a numerical interval. In my test, the interval is 310 to 320.
- Set the starting concurrency value too high is not a good choice. It might cause high CPU and make the executions slower.