In this article, we describe how NetApp Disaster Recovery Orchestrator (DRO) can simplify disaster recovery orchestration for virtual machines and applications running on Azure VMware Solution (AVS), using Azure NetApp Files datastores. We demonstrate how DRO enables AVS administrators to easily set up disaster recovery replication plans, resource groups and simulate failover.
Co-authors: Niyaz Mohamed, Principal Solutions Architect, Tech Evangelist, Migration & Modernization Advisory Specialist (NetApp)
Disaster recovery using block-level replication between regions within the cloud is a resilient and cost-effective way of protecting the workloads against site outages and data corruption events, like ransomware attacks. With Azure NetApp Files (ANF) cross-region volume replication, VMware workloads running on an Azure VMware Solution (AVS) SDDC site using Azure NetApp files volumes as an NFS datastore on the primary AVS site can be replicated to a designated secondary AVS site in the target recovery region.
Disaster Recovery Orchestrator (DRO) (a scripted solution with a UI) can be used to seamlessly recover workloads replicated from one AVS SDDC to another. DRO automates recovery by breaking replication peering and then mounting the destination volume as a datastore, through VM registration to AVS, to network mappings directly on NSX-T (included with all AVS private clouds).
The Azure NetApp Files and Azure VMware Solution disaster recovery solution leveraging cross-region replication provide you with the following benefits:
The Disaster Recovery Orchestrator (DRO) is community supported and available to customers at no additional cost.
In this section, we'll cover Deploying Azure VMware Solution, provisioning and configuring Azure NetApp Files, and creating volume replication for Azure NetApp Files-powered datastore volumes.
The Azure VMware Solution (AVS) is a hybrid cloud service that provides fully functional VMware SDDCs within a Microsoft Azure public cloud. AVS is a first-party solution fully managed and supported by Microsoft and verified by VMware, that uses Azure infrastructure. Therefore, customers get access to VMware ESXi for compute virtualization, vSAN for hyper-converged storage, and NSX for networking and security, all while taking advantage of Microsoft Azure’s global presence, class-leading data-center facilities, and proximity to the rich ecosystem of native Azure services and solutions. A combination of Azure VMware Solution SDDC and Azure NetApp Files provides the best performance with minimal network latency.
To configure an AVS private cloud on Azure, follow the steps in this article. A pilot-light environment set up with a minimal configuration can be used for DR purposes. This setup only contains core components to support critical applications, and it can scale out and spawn more hosts to take the bulk of the load if a failover occurs. This setup only contains core components to support critical applications, and it can scale out and spawn more hosts to take the bulk of the load if a failover occurs.
In the initial release, DRO supports an existing AVS SDDC cluster. On-demand SDDC creation will be available in an upcoming release.
Azure NetApp Files is a high-performance, enterprise-class, metered file-storage service. It provides NAS volumes as a service for which you can create NetApp accounts, capacity pools, select service and performance levels, create volumes, and manage data protection. It allows you to create and manage high-performance, highly available, and scalable file shares, using the same protocols and tools that you're familiar with and enterprise applications rely on on-premises. Follow Attach Azure NetApp Files datastores to Azure VMware Solution hosts to provision and configure Azure NetApp Files as a NFS datastore to optimize AVS private cloud deployments.
The first step is to set up cross-region replication for the desired datastore volumes from the AVS primary site to the AVS secondary site with the appropriate frequencies and retentions.
Follow Create volume replication for Azure NetApp Files to set up cross-region replication by creating replication peering. The service level for the destination capacity pool can match that of the source capacity pool. However, for this specific use case, you can select the standard service level (lowest cost) and then modify the service level to adjust to performance requirements demanded by a real disaster recovery or simulation.
A cross-region replication relationship is a prerequisite and must be created beforehand.
To get started with DRO, use the Ubuntu operating system on the designated Azure virtual machine and make sure you meet the prerequisites. Then install the package.
Ubuntu Focal 20.04 (LTS)
The following packages must be installed on the designated agent virtual machine:
Change docker.sock to this new permission:
sudo chmod 666 /var/run/docker.sock.
The deploy.sh script included in the package executes all required prerequisites.
The steps are as follows:
git clone https://github.com/NetApp-Automation/DRO-Azure
The agent must be installed in the secondary AVS site region or in the primary AVS site region in a separate AZ than the SDDC.
tar xvf DRO-prereq.tar
sudo sh deploy.sh
After Azure NetApp Files and AVS have been configured properly, you can begin configuring DRO to automate the recovery of workloads from the primary AVS site to the secondary AVS site. NetApp recommends deploying the DRO agent in the secondary AVS site and configuring the ExpressRoute gateway connection so that the DRO agent can communicate via the network with the appropriate AVS and Azure NetApp Files components.
The first step is to Add credentials. The DRO service uses API calls to discover and manage Azure NetApp Files and Azure VMware Service resources within Microsoft Azure subscription. To provide the ability for DRO to use its API calls in the Microsoft Azure subscription, create a service principal, which is called an app registration in Microsoft Azure Active Directory. Typically, DRO uses the built-in Contributor role with the subscription. The Contributor role is used because this role covers all the API calls that DRO must perform within the subscription. If your organization prefers to avoid the use of the Contributor role in the subscription, DRO supports use of a custom role instead. If used, the custom role needs to provide for the specific API calls that DRO needs to use. To create a custom role, use a tool, such as Azure Portal, Azure PowerShell or Azure CLI and create a custom role definition that, at minimum, includes the mandatory permissions listed below in the JSON.
When you add source and destination environments, you are prompted to select the credentials associated with the service principal. You need to add these credentials to DRO before you can click Add New Site.
To perform this operation, complete the following steps:
You should have captured this information when you created the AD application.
After you add the credentials, it’s time to discover and add the primary and secondary AVS sites (both vCenter and the Azure NetApp Files storage account) to DRO. To add the source and destination site, complete the following steps:
For demonstration purposes, adding a source site is covered in this document.
Once added, DRO performs automatic discovery and displays the VMs that have corresponding cross-region replicas from the source site to the destination site. DRO automatically detects the networks and segments used by the VMs and populates them.
The next step is to group the required VMs into their functional groups as resource groups.
After the platforms have been added, group the VMs you want to recover into resource groups. DRO resource groups allow you to group a set of dependent VMs into logical groups that contain their boot orders, boot delays, and optional application validations that can be executed upon recovery.
To start creating resource groups, click the Create New Resource Group menu item.
You must have a plan to recover applications in the event of a disaster. Select the source and destination vCenter platforms from the drop down, pick the resource groups to be included in this plan, and also include the grouping of how applications should be restored and powered on (for example, domain controllers, tier-1, tier-2, and so on). Plans are often called blueprints as well. To define the recovery plan, navigate to the Replication Plan tab, and click New Replication Plan.
To start creating a replication plan, complete the following steps:
Cross- region replication (CRR) is at the volume level. Therefore, all VMs residing on the respective volume are replicated to the CRR destination. Make sure to select all VMs that are part of the datastore, because only virtual machines that are part of the replication plan are processed.
After the replication plan is created, you can exercise the failover, test failover, or migrate options depending on your requirements.
During the failover and test failover options, the most recent snapshot is used, or a specific snapshot can be selected for test failover option from a point-in-time snapshot. The point-in-time option can be very beneficial if you are facing a corruption event like ransomware, where the most recent replicas are already compromised or encrypted. DRO shows all available time points.
To trigger failover or test failover with the configuration specified in the replication plan, you can click Failover or Test Failover. You can monitor the replication plan in the task menu.
After failover is triggered, the recovered items can be seen in the secondary site AVS SDDC vCenter (VMs, networks, and datastores). By default, the VMs are recovered to Workload folder.
Failback can be triggered at the replication plan level. In case of test failover, the tear down option can be used to roll back the changes and remove the newly created volume. Ensure enough capacity is available in the capacity pool when using Test failover option. Failbacks related to failover are a two-step process. Select the replication plan and select Reverse Data sync.
After this step is complete, trigger failback to move back to the primary AVS site.
From the Azure portal, we can see that the replication health has been broken off for the appropriate volumes that were mapped to the secondary site AVS SDDC as read/write volumes. During test failover, DRO does not map the destination or replica volume. Instead, it creates a new volume of the required cross-region replication snapshot and exposes the volume as a datastore, which consumes additional physical capacity from the capacity pool and ensures that the source volume is not modified. Notably, replication jobs can continue during DR tests or triage workflows. Additionally, this process makes sure that the recovery can be cleaned up without the risk of the replica being destroyed if errors occur or corrupted data is recovered.
In today's rapidly evolving digital landscape, businesses are faced with the constant challenge of ensuring the safety and availability of their critical data and applications. Disaster recovery is a crucial aspect of any organization's IT strategy, and finding the right solution that is both powerful and cost-effective can be a daunting task.
Disaster Recovery Orchestrator (DRO) - a game-changing technology that leverages the cross-region replication capabilities of Azure NetApp Files to provide an exceptional disaster recovery solution for virtual machines running on Azure VMware Solution Private Clouds. Azure NetApp Files not only offers scalable and high-performance storage but also includes cross-region replication, making it an ideal choice for safeguarding your valuable assets.
With DRO's simple and user-friendly orchestration-based failover mechanism, businesses can easily ensure the continuity of their operations in the face of unforeseen events. Whether it's a natural disaster, a hardware failure, or any other disruptive incident, DRO empowers organizations to swiftly recover and resume their activities with minimal downtime.
What sets DRO apart from other solutions is its flexibility. It caters to the diverse needs of customers, providing them with a customizable and adaptable disaster recovery option. This means that regardless of the size or complexity of your organization, DRO can be tailored to fit your specific requirements, offering a seamless and hassle-free experience.
By combining the robust capabilities of Azure NetApp Files with the streamlined failover orchestration provided by DRO, businesses can achieve a comprehensive and cost-efficient disaster recovery strategy. The synergy between these two technologies empowers organizations to safeguard their Azure VMware Solution Private Clouds with ease, ensuring data integrity, minimizing disruptions, and enabling uninterrupted business operations.
In summary, Disaster Recovery Orchestrator (DRO) offers an invaluable solution for customers seeking a flexible and reliable disaster recovery mechanism. By harnessing the power of Azure NetApp Files' cross-region replication and coupling it with DRO's intuitive orchestration, businesses can achieve peace of mind, knowing that their critical virtual machines are safeguarded and can swiftly recover from any unforeseen events.
To learn more about the information that is described in this document, review the following documents and/or websites:
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.