Optimized autoscale – predictive planning

Microsoft

May 23, 2022

Optimized autoscale with its reactive logic has been serving Azure Data Explorer users, for long. During rise and fall of ingestion and query load the reactive logic has been adjusting the cluster size to accommodate to the changing needs.

Cluster scale out is adding more instances when more users are sending queries or the ingestion load increases, to make sure enough resources are available for performant execution of ingestion and data exploration.

Reactive autoscale as apparent from its name is reacting to actual load on the cluster; therefore, the response is by design delayed and requires time to alleviate the pressure off the cluster.

Predictive autoscale was recently added to solve this problem by planning cluster scale out and scale in ahead of time to further improve cluster performance while reducing cost.

Per each cluster the new predictive autoscale tracks the main metrics just like the reactive logic and over time builds the cluster usage pattern.

When the pattern shows a high level of seasonality over time, we use the collected data to forecast the next day usage of the cluster.

We use this forecast to plan the size of the cluster taking into account it’s SKU and specific properties.

below you can see a real life example of a cluster scale out and scale in operations before and after enabling predictive autoscaling (marked by the red line).

The blue line tracks the actual usage of the cluster as tracked by CPU.

The green line represents the cluster's instance count. You can easily see that before enabling the predictive logic, the instance count is in general higher (which means higher cost) and that the scaling operations takes place in a small delay after more or less CPU is consumed.

After enabling the predictive autoscale it is apparent that the daily scale in operations are much more significant and that the scaling operations takes place just on time. This results in significantly lower cost and higher performance through out the entire high load windows.

In special cases, when our resource allocation planning significantly diverges from the actual usage (for example, as a result of an unexpected surge in usage) we will fallback to the reactive logic and scale the cluster out as needed.

This guarantees we can also deal with anomalies and out of the ordinary events.

Soon after, assuming the level of seasonality stays high enough, the cluster will go back to using the predictive logic.

Predictive autoscale is enabled by default on all clusters using "Optimized autoscale" and you do not need to take any action to enjoy its benefits.

We’re happy to announce that this new capability is now available for all Azure Data Explorer clusters that enabled Optimized Autoscale and does not require any user interaction or further configuration.

For more details on how to use ADX dashboards export and import read this - https://docs.microsoft.com/en-us/azure/data-explorer/manage-cluster-horizontal-scaling#logic-of-optimized-autoscale

You’re welcome to add more proposals and ideas around dashboard programmatic management and other topics here and vote for them - https://aka.ms/adx.ideas

ADX team

Updated Jun 28, 2022

Version 2.0

gabil

Microsoft

Joined September 24, 2018

View Profile

Azure Data Explorer Blog

Follow this blog board to get notified when there's new activity

Blog Post

Optimized autoscale – predictive planning