Today we're announcing the public preview of Resilient create and delete, which is a new feature on Virtual Machine Scale Sets (VMSS) that increases the reliability of virtual machine creates and deletes. VMSS initiates automatic recovery from failed creates and deletes by performing retries of those operations on customers’ behalf – ultimately reducing the manual effort required to detect and clean up unused resources.
Customers often need to spend significant time understanding, debugging, and fixing failures during create and delete operations, which can lead to frustration and decreased productivity. The resilient create and delete feature will monitor failures during these operations and automatically recover or delete virtual machines, thereby increasing reliability without additional effort from customers. This feature is available across all public Azure regions.
Key Benefits
- Higher reliability in creating and deleting virtual machines in scale sets.
- Automated recovery of failed operations to significantly reduce manual toil on customers.
- Reduces Time to Detect (TTD) and Time to Mitigate (TTM) for virtual machine creates and deletes in scale sets.
- Comprehensive error handling for virtual machine deletes by retrying on all error codes.
- Reliable initiation of cleaning up unusable capacity.
Resilient create
Resilient create runs on virtual machines created during a scale-out of a scale set or during the initial scale set creation. It initiates retries for only OS Provisioning Timeout and Virtual Machine Start Timeout errors. Resilient create attempts the create operation 5 times per virtual machine or for a maximum of 30 total minutes for all retries.
Resilient delete
Resilient delete initiates delete retries for any error, including but not limited to, InternalExecutionError, TransientFailure, or InternalOperationError. Unlike Resilient create that only retries from OS Provisioning Timeout and VM Start Timeout currently, resilient delete doesn't differentiate between error codes. We will retry all failed deletes. It attempts to delete operation up to five times per virtual machine.
Setting up Resilient create and delete
To enable Resilient create and delete on an existing Virtual Machine Scale Set, navigate to your scale set resource in the Azure portal. Under “Capabilities”, select “Health and repair”, and enable “Resilient VM create (Preview)” and “Resilient VM delete (Preview)”.
To enable Resilient create and delete on a new Virtual Machine Scale Set during deployment, navigate to the “Health” tab and go to “Recovery”. Select checkboxes “Resilient VM create (Preview)” and/or “Resilient VM delete (Preview)”.
Learn More
You can learn more about how Resilient create and delete works and enroll in the preview in the documentation.