Today, we announce HDInsight Autoscale with enhanced capabilities which include improved latency and support for recommissioning node managers in case of load-based autoscale. Together, they improve cluster utilization massively and lower the total cost of ownership (TCO) significantly. Additionally, we also introduce customizable parameters on Autoscale to meet customer needs and bring in more flexibility to tune based on customer preferences.
Note: Effective 17th May 2023, the enhanced autoscale capabilities are available for HDInsight customers on supported workloads. HDInsight Autoscale currently supports various cluster shapes and was released for general availability on November 7th, 2019.
Autoscale feature helps customers to leverage elasticity on cloud, and HDInsight autoscale now offers significant improvements in scale-up and scale-down latencies for load-based and schedule based autoscaling. In the enhanced version, the average latency for scaling has been cut down nearly by 3x. With enhanced Autoscale, HDInsight utilizes a new workflow which is fast and reliable, this aids to the improved provisioning workflows, making scaling decisions more effective.
The below numbers indicate the latency improvements with enhanced autoscale. They account for scaling 50 nodes (Spark) cluster:
Scaling |
Cluster Type |
Old |
Enhanced |
Scale Up |
ESP - Spark |
~29 mins |
~ 10 mins |
Non-ESP - Spark |
~25 mins |
~ 7 mins |
|
Scale Down |
ESP - Spark |
~ 15 mins |
~ 4 mins |
Non-ESP - Spark |
~ 11 mins |
~ 0.5 mins |
HDInsight autoscale has introduced support for recommissioning nodes before provisioning new nodes. If there are nodes in decommissioning state waiting to gracefully decommission, autoscale will select those nodes (as per requirement) and recommission them so that they can be utilized to share the increased load. Cluster load is re-evaluated after a cool down period and if needed new nodes are added. This feature significantly reduces the time to add new cluster capacity (as recommission takes seconds). This feature is available in load-based autoscale on supported cluster shapes (Spark and Hive) that used YARN for resource management.
Note: Spark Streaming support with Autoscale is on the roadmap.
Following are few configurations that can be tuned to custom configure HDInsight Autoscale as per customer needs.
This is applicable for 4.0 and 5.0 stacks.
Configuration |
Description |
Default value |
Applicable cluster/ Autoscale type |
Remarks |
yarn.4_0.graceful.decomm.workaround.enable |
Enable YARN graceful decommissioning |
Loadbased autoscale – True |
Hadoop/Spark |
If this configuration is disabled, YARN puts nodes in Decommissioned state directly from Running state without waiting for the applications using the node to finish. This might lead to applications getting killed abruptly when nodes are decommissioned. |
yarn.graceful.decomm.timeout |
YARN graceful decommissioning timeout in seconds |
Hadoop Loadbased – 3600 |
Hadoop/Spark |
Graceful decommissioning timeout is best configured according to customer applications. For example – if an application has many mappers and few reducers which can take 4 hours to complete, this configuration needs to be set to more than 4 hours |
yarn.max.scale.up.increment |
Maximum number of nodes to scale up in one go |
200 |
Hadoop/Spark/Interactive Query |
It has been tested with 200 nodes. We do not recommend configuring this to more than 200. It can be set to less than 200 if the customer wants less aggressive scale up |
yarn.max.scale.down.increment |
Maximum number of nodes to scale up in one go |
50 |
Hadoop/Spark/Interactive Query |
Can be set to up to 100 |
nodemanager.recommission.enabled |
Feature to enabled recommissioning of decommissioning NMs before adding new nodes to the cluster |
True |
Hadoop/Spark load based autoscaling |
Disabling this feature can cause underutilization of cluster (there can be nodes in decommissioning state which have no containers running but are waiting for application to finish) even if there is more load in the cluster
Note: Applicable for images on 2304280205 or later |
UnderProvisioningDiagnoser.time.ms |
Time in milliseconds for which cluster needs to under provisioned for scale up to trigger |
180000 |
Hadoop/Spark load based autoscale |
|
OverProvisioningDiagnoser.time.ms |
Time in milliseconds for which cluster needs to be overprovisioned for scale down to trigger |
180000 |
Hadoop/Spark load based autoscaling |
|
hdfs.decommission.enable |
Decommission datanodes before triggering decommissioning nodemanagers. HDFS does not support any graceful decommission timeout, it’s immediate |
True |
Hadoop/Spark |
Decommissioning datanodes before decommissioning nodemanagers so that particular datanode is not used for storing shuffle data |
scaling.recommission.cooldown.ms |
Cooldown period after recommission during which no metrics are sampled |
120000 |
Hadoop/Spark load based autoscaling |
This cooldown period ensures the cluster has some time to re-distribute the load to the newly recommissioned nodemanagers
Note: Applicable for images on 2304280205 or later |
scale.down.nodes.with.ams |
Scale down nodes where an AM is running |
false |
Hadoop/Spark |
Can be turned on if there are enough reattempts configured for the AM. Useful for cases where there are long running applications (example spark streaming) which can be killed for scaling down cluster if load has reduced
Note: Applicable for images on 2304280205 or later |
Note
Note:
Reference:
Automatically scale Azure HDInsight clusters | Microsoft Learn
Versioning introduction - Azure HDInsight | Microsoft Learn
Open-source components and versions - Azure HDInsight | Microsoft Learn
Customize Azure HDInsight clusters by using script actions | Microsoft Learn
Azure HDInsight architecture with Enterprise Security Package | Microsoft Learn
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.