Guidance for protecting Azure Virtual Machine against Zonal/Regional outages using Azure Site Recovery and Azure Backup
Summary
Disaster Recovery (DR) and Backup are two ways to recover from outages. To ensure that you have the necessary controls to protect your data even when relying on native tools included by your provider, you must get familiar with the platform features, weigh in the cost and benefits, and formulate a data protection strategy that best works for your business. The following provides a summary of choices provided by Azure Backup and Azure Site Recovery:
- You can recover your Azure Virtual Machines and databases against zonal outages by using Azure Backup. In the event of an outage, you can recover to either the secondary availability zone or the secondary paired Azure region, with a RPO (Recovery Point Objective) of approximately 36 hours.
- You can recover your Azure Virtual Machines against outages by using Azure Site Recovery. In the event of an outage, you can recover to either the secondary availability zone or the secondary Azure region, with a RPO (Recovery Point Objective) of approximately 5 minutes and an RTO (Recovery Time Objective) of less than 1 hour.
- Outage scenarios and options:
|
Zonal Outage |
Regional Outage |
Azure Backup |
|
|
Azure Site Recovery |
|
|
When it comes to protecting your Azure Virtual Machines from unwanted downtime and data loss, it is important to be familiar with several concepts, strategies, tools, and capabilities to make informed decisions. Review the subsequent sections for more detailed information.
Concepts
Azure provides high availability, disaster recovery, and backup solutions that can enable your applications to meet business availability requirements and recovery objectives. The following summarizes the key concepts and capabilities.
- Azure is divided physically and logically into units called Azure regions. There are many Azure regions across the world.
- Some of the Azure regions provide availability zones. An availability zone are physically separated groups of datacenters within a region. Each availability zone has independent power, cooling, and networking infrastructure. Azure availability zones help protect your deployment from datacenter failures.
- When you deploy into an Azure region that contains availability zones, you can use multiple availability zones together. Doing so you can keep separate copies of your application and data within separate physical datacenters in a large metropolitan area.
- Each datacenter is assigned to a physical zone. Physical zones are mapped to logical zones in your Azure subscription, and different subscriptions might have a different mapping order.
- Availability Sets ensures the Azure VMs are deployed across multiple isolated hardware nodes in a cluster within a single Availability Zone (AZ). If a hardware or software failure happens, only a subset of your VMs are impacted and your overall solution stays operational. Availability Sets are essential for building redundancy for your cloud solution. Each VM in an Availability Set is assigned a Fault Domain and an Update Domain by the Azure Platform. Using this, you could host multiple tier app with each fault domain. Here the Fault Domain shares a common power source and a network switch. A failure in a fault domain will cause all the app resources in that fault domain to become unavailable.
- Availability sets protect against failures within a Data Center
- Availability Zones protect against failure from the entire Data Center, within a single Region.
- Azure Paired regions (within same geography to maintain data residency) protect against failure from entire Regional outage. A primary and secondary region together form a region pair. Within Geography, for certain regions Azure provides list of regional pairs (for ex. Geography: Australia, Regional Pair A: Australia East, Regional Pair B: Australia Southeast). For complete list, refer Azure regional pairs.
- Certain regions do not have paired region (ex. Qatar, Poland, Israel etc.).
- For regions that support Paired regions, Azure uses cross-region replication to asynchronously replicates applications and data from one region to another paired region.
- During VM creation, a VM can be pinned/not-pinned to a zone based on configuration choices as shown below:
Non-Zone Pinned: VM created with “No infrastructure redundancy required” is created without any zone |
|
|
Zone Pinned: In this example, +Create VM > Basics tab, the TestVM will be created in
After creating the VM, if you try to edit its Availability Zone property, it will not allow – indicating that it is pinned to “Zone 1” |
Zone Pinned: By multi-selecting Zone 1, 2, 3, the create step will automatically create multiple VM instances: TestVM-1 (in Zone 1), TestVM-2 (in Zone 2), and TestVM-3 (in Zone 3) |
- Certain Azure services, such as Azure Storage takes advantage of cross-region replication and provides two options for copying data to a secondary - GRS (Geo-redundant storage (GRS) and Geo-zone-redundant storage (GZRS). With GRS or GZRS, the data in the secondary region isn't available for read or write access unless there's a failover to the primary region. Storage failover is not automatic, it is either initiated by Microsoft (Microsoft-managed failover) or by you (Customer-managed failover) (learn more).
- Shared responsibility becomes the crux of your strategic decision-making when it comes to disaster recovery. Azure doesn't require you to use cross-region replication, and not all Azure services automatically replicate data or automatically fall back from a failed region to cross-replicate to another enabled region. Depending on which option you choose that best fits your needs, you are responsible for configuring the recovery and replication.
- Availability sets protect against failures within a Data Center
- Availability Zones protect against failure from the entire Data Center, within a single Region. If you choose Zone-redundant resource, then your resources are spread across multiple availability zones. Microsoft manages spreading requests across zones and the replication of data across zones. If an outage occurs in a single availability zone, Microsoft manages failover automatically. You can choose Zonal resources (i.e. with no redundancy, it is pinned to a single availability zone, then no automatic protection is provided during Zone outage. You're responsible for managing data replication, distributing requests across zones and failover.)
- Azure Paired regions (within same geography to maintain data residency) protect against failure from entire Regional outage. A primary and secondary region together form a region pair. Azure uses cross-region replication to asynchronously replicates applications and data from one region to another paired region. Azure services, such as Azure Storage takes advantage of cross-region replication for copying your data to a secondary region with additional options such as GRS (Geo-redundant storage (GRS)
Azure Backup options to protect VMs and databases against Zonal and Regional outages
For Azure Virtual Machines and databases protected using Azure Backup, in an event of outage you can recover to (based on configuration) Secondary availability zone (or) Secondary paired Azure region, with an RPO: ~36 hrs.
Configuration to handle Zonal outage
You can restore an Azure VM from the default zone to any available zones, this referred as Cross Zonal Restore (CZR). During restore you have the flexibility to choose and restore VM to availability zone drop within that region.
To use this functionality, the following pre-requisites must be met:
- The source protected Azure VM is zone pinned/non-zone pinned.
- The recovery point is present in vault tier only (Snapshots only or snapshot and vault tier are not supported)
- The recovery option is to either create a new VM or to restore disks (replace disks option replaces source data and hence the availability zone option is not applicable)
- Creating VM/disks in the same region when vault's storage redundancy is ZRS (Doesn't work when vault's storage redundancy is GRS even though the source VM is zone pinned)
- If the pinned zone is unavailable, you won't be able to restore the data to another zone because the backed-up data isn't zonally replicated. The restore in availability zones is possible from recovery points in vault tier only.
- Supported only for managed VMs.
- CZR supports restore with Managed System Identities (MSI).
- CZR supports restore of an Azure zone pinned/non-zone pinned VM from a vault with Zonal-redundant storage (ZRS) enabled.
- CZR supports restore of zone-pinned VM only from a vault with Cross Region Restore (CRR) (if the secondary region supports zones).
- CZR is supported from secondary regions.
- It's unsupported from snapshots restore point.
- It's unsupported for Encrypted Azure VMs.
- Like Azure VM, you can now restore Azure VM disks from the default zone to any available zones. Default zone is the zone in which the VM disks reside.
Configuration to handle Regional outage
Cross Region restore can be used to restore Azure VMs in the secondary region, which is an Azure paired region.
- If CRR is enabled, you can view the backup items in the secondary region. From the portal, go to Recovery Services vault > Backup items. Select Secondary Region to view the items in the secondary region.
Primary Region |
Secondary Region |
|
- Restore in secondary region – The secondary region restore user experience will be similar to the primary region restore user experience. When configuring details in the Restore Configuration pane to configure your restore, you'll be prompted to provide only secondary region parameters.
- Azure Backup leverages storage accounts’ read-access geo-redundant storage (RA-GRS) capability to support restores from a secondary region. You must factor in latency in the backed up data being available for a restore in the secondary region, due to delays in storage replication from primary to secondary.
- To avail this, many pre-requisites must be met:
- You can restore all the Azure VMs for the selected recovery point if the backup is done in the secondary region.
- It's unsupported from snapshots restore point. During the backup, snapshots aren't replicated to the secondary region. Only the data stored in the vault is replicated. So secondary region restores are only vault tier restores.
- Restore options is available with “Create a VM” or “Restore Disks”, “Replace existing disks” option is not supported.
- Cross Region Restore is supported only for a Recovery Services vault that uses the GRS replication type.
- Virtual machines (VMs) created through Azure Resource Manager and encrypted Azure VMs are supported. VMs created through the classic deployment model aren't supported. You can restore the VM or its disk.
- SQL Server or SAP HANA databases hosted on Azure VMs are supported. You can restore databases or their files.
- MARS Agent is supported for vaults without private endpoint (preview).
- Review the support matrix for a list of supported managed types and regions.
- Using Cross Region Restore will incur additional charges. Once you enable Cross Region restore, it might take up to 48 hours for the backup items to be available in secondary regions. Learn more about pricing.
- Cross Region Restore currently can't be reverted to GRS or LRS after the protection starts for the first time.
- Currently, secondary region RPO is 36 hours. This is because the RPO in the primary region is 24 hours and can take up to 12 hours to replicate the backup data from the primary to the secondary region.
- Review the permissions required to use Cross Region Restore.
- CRR is currently not supported for machines running on Ultra disks.
- CRR restores CMK (customer-managed keys) enabled Azure VMs, which aren't backed-up in a CMK enabled Recovery Services vault, as non-CMK enabled VMs in the secondary region.
- For CRR, the Staging Location (that is the storage account location) must be in the region that the Recovery Services vault treats as the secondary region. For example, a Recovery Services vault is located in East US 2 region (with Geo-Redundancy and Cross Region Restore enabled). This means that the secondary region would be Central US. Therefore, you need to create a storage account in Central US to perform a Cross Region Restore of the VM.
ASR options to protect VMs against Zonal and Regional outages
For Azure Virtual Machines protected using Azure Site Recovery, in an event of outage you can recover to (based on configuration) Secondary availability zone (or) Secondary Azure region, with an RPO: ~5 min and RTO <1 hr.
Configuration to handle Zonal outage
- The Zone-to-Zone disaster recovery option makes it possible to replicate and orchestrate the failover of applications in Azure across Azure Availability Zones within a given region. This capability suites those who require maintaining data residency and local compliance, reducing the complexity of configuring a DR strategy in a secondary region, latency, and improving the recovery point objective (RPO).
- If a single Azure availability zone is compromised, fail over your VMs to a different zone within the same region and access them from the secondary availability zone.
- If VM is already deployed in an Availability Zone, then from Azure portal > Virtual machines > select the VM > In Backup + disaster recovery, select Disaster recovery > on the Basics tab, for Disaster recovery between availability zones?, select Yes, and complete the steps (learn more)
From VM portal experience |
From Recovery Services Vault portal |
|
- This feature is limited to Azure VM in Availability Zones, make sure you are familiar all pre-requisites and supported regions.
Configuration to handle Regional outage
- You can use ASR to protect virtual machines (VM) to a different target region for Regional DR. For example, as shown below your source VM located in East US is protected to target UK West region (learn more).
- Make sure you are familiar with supported platform features.