General Availability: Predictive Autoscaling for VMSS
Published Oct 13 2022 11:34 AM 2,080 Views

Azure autoscale automatically increases or decreases the number of VM instances for an Azure virtual machine scale set (VMSS) that run your application. This automated and elastic behavior reduces the management overhead to monitor, need for overprovisioning and optimize the performance of your application. You create rules that define the acceptable performance for a positive customer experience. When those defined thresholds are met, autoscale rules take action to adjust the capacity of your scale set. You can also schedule events to automatically increase or decrease the capacity of your scale set at fixed times.


We are pleased to announce that you can now use machine learning to help manage and scale out your Virtual Machine Scale Sets with Predictive autoscale for Percentage CPU metrics. The capacity needs of your Virtual Machine Scale Sets are forecasted based on the historical CPU patterns. When enabled, the predicted overall CPU load is observed and scale-out occurs in advance and in time to meet the demand.


Predictive autoscale complements loads that have cyclical workload patterns and are especially advantageous when Virtual Machines have long provisioning times, having the scale out occur before the workload demand reaches peak load while lowering the costs.

You can either enable Forecast Only or Predictive autoscale. Forecast Only allows you to view your predicted CPU forecast without triggering the scaling action based on the prediction. You can then compare the forecast with your actual workload patterns to build confidence in the prediction models before you enable the predictive autoscale feature. Enabling predictive autoscale will scale out resources based on the forecasted workloads.

The predictive autoscale model works by observing the different capacity needs of your resources and begins to take scale actions based on expected usage peaks. While predictive autoscale manages these expected demand peaks, we still recommend creating standard autoscale rules to handle any unexpected traffic spikes. Predictive autoscale can be configured and run alongside any of your existing standard autoscale rules; in cases where both predictive autoscale and a standard autoscale rule apply the system will select the rule that results in the higher instance count to optimize for availability.

Predictive autoscale can be configured via Azure Portal, CLI, ARM template and PowerShell.

To get started with predictive autoscale from Azure Portal navigate to the Virtual Machine Scale Set scaling blade and then to Predictive autoscale section




Using the Predictive Autoscale dropdown selection, you can:

  • Enable forecast only mode
  • Enable predictive autoscale



Once you’ve made your predictive autoscale selection (i.e. enabled forecast only or predictive autoscale) and optionally configure how far in advance you want to scale out click on Save and then the process starts.

A new tab is now available in the scaling tab: Predictive Charts which provides you with an in-depth overview of the forecasted CPU as well as any autoscaling that occurs if predictive autoscale is enabled.

Please note that Predictive autoscale requires a minimum of 7 days of history to provide predictions. The most accurate results come from 15 days of historical data.





  • The top chart shows an overlaid comparison of actual versus predicted total CPU percentage.
  • The middle chart shows the number of instances running at specific times.
  • The bottom chart shows the current Average CPU utilization


 In addition to this you have an option to move between forecast only or predictive autoscale as well as the ability to view the charts with time grain ranging from 12 hours to 7 days.



You can head over to Run History to review when scale outs occur due to predictive autoscale.




Predictive autoscale is generally available in all public regions as of today, Oct 12, 2022. There are no costs associated with turning on autoscale or Predictive autoscale and the customer only incurs the costs depending on the number of resources they maintain. For more information you can refer to the documentation or email:

Version history
Last update:
‎Oct 13 2022 11:34 AM
Updated by: