Blog Post

Azure Architecture Blog
14 MIN READ

Disaster Recovery using cross-region replication with Azure NetApp Files datastores for AVS

GeertVanTeylingen's avatar
Jul 11, 2023

Table of Contents

Abstract

Introduction

Benefits

Prerequisites and general recommendations

Getting Started

Deploy Azure VMware Solution

Provision and configure Azure NetApp Files

Create volume replication for Azure NetApp Files-powered datastore volumes

DRO Installation

Prerequisites

OS requirements

DRO Configuration

Setup

Resource Grouping

Replication Plans

Failover and Failback

Summary

Additional Information

 

Abstract

In this article, we describe how NetApp Disaster Recovery Orchestrator (DRO) can simplify disaster recovery orchestration for virtual machines and applications running on Azure VMware Solution (AVS), using Azure NetApp Files datastores. We demonstrate how DRO enables AVS administrators to easily set up disaster recovery replication plans, resource groups and simulate failover.

 

Co-authors: Niyaz Mohamed, Principal Solutions Architect, Tech Evangelist, Migration & Modernization Advisory Specialist (NetApp)

 

Introduction

Disaster recovery using block-level replication between regions within the cloud is a resilient and cost-effective way of protecting the workloads against site outages and data corruption events, like ransomware attacks. With Azure NetApp Files (ANF) cross-region volume replication, VMware workloads running on an Azure VMware Solution (AVS) SDDC site using Azure NetApp files volumes as an NFS datastore on the primary AVS site can be replicated to a designated secondary AVS site in the target recovery region.

 

Disaster Recovery Orchestrator (DRO) (a scripted solution with a UI) can be used to seamlessly recover workloads replicated from one AVS SDDC to another. DRO automates recovery by breaking replication peering and then mounting the destination volume as a datastore, through VM registration to AVS, to network mappings directly on NSX-T (included with all AVS private clouds).

 

 

 

 

Benefits

The Azure NetApp Files and Azure VMware Solution disaster recovery solution leveraging cross-region replication provide you with the following benefits:

 

  • Leverage efficient and resilient Azure NetApp Files cross-region replication.
  • Recover to any available point-in-time with snapshot retention.
  • Fully automate all required steps to recover hundreds to thousands of VMs from the storage, compute, network, and application validation steps.
  • Workload recovery leverages the “Create new volumes from the most recent snapshots” process, which doesn’t manipulate the replicated volume.
  • Avoid any risk of data corruption on the volumes or snapshots.
  • Avoid replication interruptions during DR test workflows.
  • Leverage DR data and cloud compute resources for workflows beyond DR, such as dev/test, security testing, patch and upgrade testing, and remediation testing.
  • CPU and RAM optimization can help lower AVS SDDC costs by allowing recovery to smaller compute clusters.

 

(i) Important

 

The Disaster Recovery Orchestrator (DRO) is community supported and available to customers at no additional cost.

 

 

Prerequisites and general recommendations

 

  • Verify that you have enabled cross-region replication by creating replication peering. See Create volume replication for Azure NetApp Files.
  • You must configure ExpressRoute Global Reach between the source and target Azure VMware Solution private clouds.
  • You must have a service principal that can access resources. that can access resources.
  • The following topology is supported: primary AVS site to secondary AVS site and failback.
  • Configure the replication schedule for each volume appropriately based on business needs and the data-change rate. 

 

Getting Started

In this section, we'll cover Deploying Azure VMware Solution, provisioning and configuring Azure NetApp Files, and creating volume replication for Azure NetApp Files-powered datastore volumes.

 

Deploy Azure VMware Solution

The Azure VMware Solution (AVS) is a hybrid cloud service that provides fully functional VMware SDDCs within a Microsoft Azure public cloud. AVS is a first-party solution fully managed and supported by Microsoft and verified by VMware, that uses Azure infrastructure. Therefore, customers get access to VMware ESXi for compute virtualization, vSAN for hyper-converged storage, and NSX for networking and security, all while taking advantage of Microsoft Azure’s global presence, class-leading data-center facilities, and proximity to the rich ecosystem of native Azure services and solutions. A combination of Azure VMware Solution SDDC and Azure NetApp Files provides the best performance with minimal network latency.

 

To configure an AVS private cloud on Azure, follow the steps in this article. A pilot-light environment set up with a minimal configuration can be used for DR purposes. This setup only contains core components to support critical applications, and it can scale out and spawn more hosts to take the bulk of the load if a failover occurs. This setup only contains core components to support critical applications, and it can scale out and spawn more hosts to take the bulk of the load if a failover occurs.

 

(!) Note

 

In the initial release, DRO supports an existing AVS SDDC cluster. On-demand SDDC creation will be available in an upcoming release.

 

 

Provision and configure Azure NetApp Files

Azure NetApp Files is a high-performance, enterprise-class, metered file-storage service. It provides NAS volumes as a service for which you can create NetApp accounts, capacity pools, select service and performance levels, create volumes, and manage data protection. It allows you to create and manage high-performance, highly available, and scalable file shares, using the same protocols and tools that you're familiar with and enterprise applications rely on on-premises. Follow Attach Azure NetApp Files datastores to Azure VMware Solution hosts to provision and configure Azure NetApp Files as a NFS datastore to optimize AVS private cloud deployments.

 

Create volume replication for Azure NetApp Files-powered datastore volumes

The first step is to set up cross-region replication for the desired datastore volumes from the AVS primary site to the AVS secondary site with the appropriate frequencies and retentions.

 

 

Follow Create volume replication for Azure NetApp Files to set up cross-region replication by creating replication peering. The service level for the destination capacity pool can match that of the source capacity pool. However, for this specific use case, you can select the standard service level (lowest cost) and then modify the service level  to adjust to performance requirements demanded by a real disaster recovery or simulation.

 

(!) Note

 

A cross-region replication relationship is a prerequisite and must be created beforehand.

 

 

DRO Installation

To get started with DRO, use the Ubuntu operating system on the designated Azure virtual machine and make sure you meet the prerequisites. Then install the package.

 

Prerequisites

  • Service principal that can access resources.
  • Make sure that appropriate connectivity exists to the source and destination SDDC and Azure NetApp Files volumes.
  • DNS resolution should be in place if you are using DNS names. You can also use IP addresses for vCenter.

 

OS requirements

Ubuntu Focal 20.04 (LTS)

 

The following packages must be installed on the designated agent virtual machine:

  • Docker
  • Docker- compose
  • Jq

Change docker.sock to this new permission: 

sudo chmod 666 /var/run/docker.sock.

 

(!) Note

 

The deploy.sh script included in the package executes all required prerequisites.

 

 

The steps are as follows:

  1. Download the installation package onto the designated virtual machine.
    git clone https://github.com/NetApp-Automation/DRO-Azure

    (!) Note

     

    The agent must be installed in the secondary AVS site region or in the primary AVS site region in a separate AZ than the SDDC.

     

 

  1. Navigate to DRO-Azure directory, extract the files from DRO-prereq, then run the deployment script, and enter the host IP (for example,  10.10.10.10):
    tar xvf DRO-prereq.tar

    Navigate to the directory and run the deploy script as below:
    sudo sh deploy.sh

 

  1. Access the UI using the following credentials:

    Username: admin
    Password: admin

 

DRO Configuration

After Azure NetApp Files and AVS have been configured properly, you can begin configuring DRO to automate the recovery of workloads from the primary AVS site to the secondary AVS site. NetApp recommends deploying the DRO agent in the secondary AVS site and configuring the ExpressRoute gateway connection so that the DRO agent can communicate via the network with the appropriate AVS and Azure NetApp Files components.

 

Setup

The first step is to Add credentials. The DRO service uses API calls to discover and manage Azure NetApp Files and Azure VMware Service resources within Microsoft Azure subscription. To provide the ability for DRO to use its API calls in the Microsoft Azure subscription, create a service principal, which is called an app registration in Microsoft Azure Active Directory. Typically, DRO uses the built-in Contributor role with the subscription. The Contributor role is used because this role covers all the API calls that DRO must perform within the subscription. If your organization prefers to avoid the use of the Contributor role in the subscription, DRO supports use of a custom role instead. If used, the custom role needs to provide for the specific API calls that DRO needs to use. To create a custom role, use a tool, such as Azure Portal, Azure PowerShell or Azure CLI and create a custom role definition that, at minimum, includes the mandatory permissions listed below in the JSON.

 

"actions": [
                    "*/read",
                    "microsoft.vmware/vcenters/Read",
                    "microsoft.vmware/vcenters/Write",
                    "microsoft.vmware/virtualmachines/Read",
                    "microsoft.vmware/virtualmachines/Write",
                    "microsoft.vmware/virtualnetworks/Read",
                    "Microsoft.NetApp/netAppAccounts/read",
                        "Microsoft.NetApp/netAppAccounts/capacityPools/read",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/read",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/write",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/delete",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/ListReplications/action",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/ReplicationStatus/action",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/BreakReplication/action",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/Revert/action",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/ReInitializeReplication/action",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/AuthorizeReplication/action",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/ResyncReplication/action",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/DeleteReplication/action",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/RevertRelocation/action",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/ReestablishReplication/action",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/snapshots/read",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/snapshots/write",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/subvolumes/read",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/subvolumes/write",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/snapshots/delete",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/ReplicationStatus/read",
                    "Microsoft.NetApp/netAppAccounts/snapshotPolicies/Volumes/action",
                    "Microsoft.NetApp/netAppAccounts/snapshotPolicies/ListVolumes/action",
                    "Microsoft.NetApp/locations/RegionInfo/read",
                    "Microsoft.NetApp/Operations/read",
                    "Microsoft.Resources/subscriptions/resourceGroups/read",
                    "Microsoft.Resources/checkResourceName/action",
                    "Microsoft.Resources/deployments/read",
                    "Microsoft.Resources/providers/read",
                    "Microsoft.Resources/resources/read",
                    "Microsoft.AVS/privateClouds/read",
                    "Microsoft.AVS/privateClouds/clusters/read",
                    "Microsoft.AVS/privateClouds/clusters/datastores/read",
                    "Microsoft.AVS/privateClouds/clusters/datastores/write",
                    "Microsoft.AVS/privateClouds/clusters/datastores/delete",
                    "Microsoft.AVS/privateClouds/clusters/datastores/operationstatuses/read",
                    "Microsoft.AVS/privateclouds/clusters/datastores/operationresults/read",
                    "Microsoft.AVS/privateClouds/workloadNetworks/segments/read",
                    "Microsoft.AVS/privateClouds/workloadNetworks/vmGroups/read",
                    "Microsoft.AVS/privateClouds/workloadNetworks/virtualMachines/read",
                    "Microsoft.NetApp/netAppAccounts/capacityPools/volumes/subvolumes/GetMetadata/action",
                    "Microsoft.NetApp/locations/operationresults/read",
                    "Microsoft.NetApp/locations/checknameavailability/action",
                    "Microsoft.ResourceGraph/resources/read",
                    "Microsoft.Resources/subscriptions/resources/read",
                    "Microsoft.Resources/subscriptions/read",
                    "Microsoft.Resources/subscriptions/operationresults/read",
                    "Microsoft.ResourceGraph/resourcesHistory/read",
                    "Microsoft.ApiManagement/operations/read",
                    "Microsoft.ApiManagement/service/tenants/apis/operations/read"
                ],

 

When you add source and destination environments, you are prompted to select the credentials associated with the service principal. You need to add these credentials to DRO before you can click Add New Site.

 

To perform this operation, complete the following steps:

  1. Open DRO in a supported browser and use the default username and password (admin/admin).
    The password can be reset after the first login using the Change Password option.
  1. In the upper right of the DRO console, click the Settings icon, and select Credentials.

  2. Click Add New Credential and follow the steps in the wizard.

  3. To define the credentials, enter information about the Azure Active Directory service principal that grants the required permissions:
    1. Credential name
    2. Tenant ID
    3. Client ID
    4. Client secret
    5. Subscription ID

You should have captured this information when you created the AD application.

  1. Confirm the details about the new credentials and click Add Credential.

     

After you add the credentials, it’s time to discover and add the primary and secondary AVS sites (both vCenter and the Azure NetApp Files storage account) to DRO. To add the source and destination site, complete the following steps:

 

  1. Go to the Discover tab.

  2. Click Add New Site.

  3. Add the following primary AVS site (designated as Source in the console).
    1. SDDC vCenter
    2. Azure NetApp Files storage account

  4. Add the following secondary AVS site (designated as Destination in the console).
    1. SDDC vCenter
    2. Azure NetApp Files storage account

          

  1. Add site details by clicking Source, entering a friendly site name, and select the connector. Then click Continue.

    (!) Note

     

    For demonstration purposes, adding a source site is covered in this document.

     


    Update the vCenter details. To do this, select the credentials, Azure region, and resource group from the dropdown for the primary AVS SDDC.

  2. DRO lists all the available SDDCs within the region. Select the designated private cloud URL from the dropdown.

  3. Enter the cloudadmin@vsphere.local user credentials. This can be accessed from Azure Portal following Tutorial: Access an Azure VMware Solution private cloud. Once done, click Continue.



  4. Select the Source Storage details (ANF) by selecting the Azure Resource group and NetApp account.



  5. Click Create Site.

     

    Once added, DRO performs automatic discovery and displays the VMs that have corresponding cross-region replicas from the source site to the destination site. DRO automatically detects the networks and segments used by the VMs and populates them.

     

    The next step is to group the required VMs into their functional groups as resource groups.

  6. Click Add New Credential and follow the steps in the wizard.

 

Resource Grouping

After the platforms have been added, group the VMs you want to recover into resource groups. DRO resource groups allow you to group a set of dependent VMs into logical groups that contain their boot orders, boot delays, and optional application validations that can be executed upon recovery.

 

To start creating resource groups, click the Create New Resource Group menu item.

  1. Access Resource Groups and click Create New Resource Group.

     

  2. Under New Resource Group, select the source site from the dropdown and click Create.

  3. Provide the resource group details and click Continue.

  4. Select appropriate VMs using the search option.

  5. Select the Boot Order and Boot Delay (secs) for all the selected VMs. Set the order of the power- on sequence by selecting each virtual machine and setting up the priority for it. The default value for all virtual machines is 3. The options are as follows:
    1. ) 1 - The first virtual machine to power on
    2. ) 3 - Default
    3. ) 5 - The last virtual machine to power on

 

  1. Click Create Resource Group.

 

Replication Plans

You must have a plan to recover applications in the event of a disaster. Select the source and destination vCenter platforms from the drop down, pick the resource groups to be included in this plan, and also include the grouping of how applications should be restored and powered on (for example, domain controllers, tier-1, tier-2, and so on). Plans are often called blueprints as well. To define the recovery plan, navigate to the Replication Plan tab, and click New Replication Plan.

 

To start creating a replication plan, complete the following steps:

  1. Navigate to Replication Plans and click Create New Replication Plan.

 

  1. On the New Replication Plan, provide a name for the plan and add recovery mappings by selecting the Source Site, associated vCenter, Destination Site, and it’s associated vCenter.

  2. After recovery mapping is complete, select the Cluster Mapping.

  3. Select Resource Group Details and click Continue.

  4. Set the execution order for the resource group. This option enables you to select the sequence of operations when multiple resource groups exist.

  5. Once done, set network mapping to the appropriate segment. The segments should already be provisioned on the secondary AVS cluster, and, to map the VMs to those, select the appropriate segment.

  6. Datastore mappings are automatically selected based on the selection of VMs.

    (!) Note

     

    Cross- region replication (CRR) is at the volume level. Therefore, all VMs residing on the respective volume are replicated to the CRR destination. Make sure to select all VMs that are part of the datastore, because only virtual machines that are part of the replication plan are processed.

     


  7. Under VM details, you can optionally resize the VMs CPU and RAM parameters. This can be very helpful when you are recovering large environments to smaller target clusters or when you are conducting DR tests without having to provision a one-to-one physical VMware infrastructure. Also, modify the boot order and boot delay (secs) for all the selected VMs across the resource groups. There is an additional option to modify the boot order if any changes are required from what you selected during resource- group boot- order selection. By default, the boot order selected during resource- group selection is used, however any modifications can be performed at this stage.

  8. Click Create Replication Plan.

 

Failover and Failback

 

After the replication plan is created, you can exercise the failover, test failover, or migrate options depending on your requirements.

 

 

During the failover and test failover options, the most recent snapshot is used, or a specific snapshot can be selected for test failover option from a point-in-time snapshot. The point-in-time option can be very beneficial if you are facing a corruption event like ransomware, where the most recent replicas are already compromised or encrypted. DRO shows all available time points.

 

 

To trigger failover or test failover with the configuration specified in the replication plan, you can click Failover or Test Failover. You can monitor the replication plan in the task menu.

 

After failover is triggered, the recovered items can be seen in the secondary site AVS SDDC vCenter (VMs, networks, and datastores). By default, the VMs are recovered to Workload folder.

 

 

Failback can be triggered at the replication plan level. In case of test failover, the tear down option can be used to roll back the changes and remove the newly created volume. Ensure enough capacity is available in the capacity pool when using Test failover option. Failbacks related to failover are a two-step process. Select the replication plan and select Reverse Data sync.

 

 

After this step is complete, trigger failback to move back to the primary AVS site.

 


 

From the Azure portal, we can see that the replication health has been broken off for the appropriate volumes that were mapped to the secondary site AVS SDDC as read/write volumes. During test failover, DRO does not map the destination or replica volume. Instead, it creates a new volume of the required cross-region replication snapshot and exposes the volume as a datastore, which consumes additional physical capacity from the capacity pool and ensures that the source volume is not modified. Notably, replication jobs can continue during DR tests or triage workflows. Additionally, this process makes sure that the recovery can be cleaned up without the risk of the replica being destroyed if errors occur or corrupted data is recovered.

 

Summary

In today's rapidly evolving digital landscape, businesses are faced with the constant challenge of ensuring the safety and availability of their critical data and applications. Disaster recovery is a crucial aspect of any organization's IT strategy, and finding the right solution that is both powerful and cost-effective can be a daunting task.

 

Disaster Recovery Orchestrator (DRO) - a game-changing technology that leverages the cross-region replication capabilities of Azure NetApp Files to provide an exceptional disaster recovery solution for virtual machines running on Azure VMware Solution Private Clouds. Azure NetApp Files not only offers scalable and high-performance storage but also includes cross-region replication, making it an ideal choice for safeguarding your valuable assets.

 

With DRO's simple and user-friendly orchestration-based failover mechanism, businesses can easily ensure the continuity of their operations in the face of unforeseen events. Whether it's a natural disaster, a hardware failure, or any other disruptive incident, DRO empowers organizations to swiftly recover and resume their activities with minimal downtime.

 

What sets DRO apart from other solutions is its flexibility. It caters to the diverse needs of customers, providing them with a customizable and adaptable disaster recovery option. This means that regardless of the size or complexity of your organization, DRO can be tailored to fit your specific requirements, offering a seamless and hassle-free experience.

 

By combining the robust capabilities of Azure NetApp Files with the streamlined failover orchestration provided by DRO, businesses can achieve a comprehensive and cost-efficient disaster recovery strategy. The synergy between these two technologies empowers organizations to safeguard their Azure VMware Solution Private Clouds with ease, ensuring data integrity, minimizing disruptions, and enabling uninterrupted business operations.

 

In summary, Disaster Recovery Orchestrator (DRO) offers an invaluable solution for customers seeking a flexible and reliable disaster recovery mechanism. By harnessing the power of Azure NetApp Files' cross-region replication and coupling it with DRO's intuitive orchestration, businesses can achieve peace of mind, knowing that their critical virtual machines are safeguarded and can swiftly recover from any unforeseen events.

 

Additional Information

To learn more about the information that is described in this document, review the following documents and/or websites:

 

Updated Jul 12, 2023
Version 2.0
No CommentsBe the first to comment