The original Japanese Edition is here.
Medium edition is here.
The other day, one of my customers asked me the following question.
Inquiry from customer
We have a system which consists of Azure Load Balancer and two VMs behind the load balancer. To meet our rules around BCDR (business continuity & disaster recovery), we would like to migrate this system with Azure Site Recovery (ASR), but the issue of “Site Recovery configuration failed (151196)” happens and prevents us from configuring ASR. What is the root cause? Do you have any workarounds or solutions?
As this inquiry is not clear for me, I asked them to elaborate the condition and issue.
- They use Standard Load Balancer.
- ExpressRoute is used to connect between their on-premise environment and Azure, and forced tunneling is enabled.
- Their application running VMs uses Table storage as a data source. They have already configured Service Endpoint for Table storage.
- As state is not shared between VMs, simple migration from one VM to another is required.
The following diagram seems to reflect customer’s environment.
VNet connected to ExpressRoute is not Hub network, so integration between ExpressRoute and Site Recovery, which is described in the following URL, is not required in this case.
Integrate ExpressRoute with disaster recovery for Azure VMs
https://docs.microsoft.com/azure/site-recovery/azure-vm-disaster-recovery-with-expressroute
Cause
If you are familiar with Azure, you would detect the root cause at once.
“Standard Load Balancer prevents VMs behind the load balancer from accessing outside located VNet. So, configuration for accessing ASR related resources outside VNet is required. Indeed forced tunneling is configured, but this configuration does not work behind Standard Load Balancer.”
This is mentioned in the document.
Issue 2: Site Recovery configuration failed (151196)
If the VMs are behind a Standard internal load balancer, by default, it wouldn’t have access to the Microsoft 365 IPs such as login.microsoftonline.com
. Either change it to Basic internal load balancer type or create outbound access as mentioned in the article Configure load balancing and outbound rules in Standard Load Balancer using Azure CLI.
ASR needs access to Azure Active Directory services such as login.microsoftonline.com, but configuration for accessing such services was not done. Forced tunneling lets you redirect or “force” all Internet-bound traffic back to your on-premises location, and default gateway is advertised from on-premise side. However, forced tunneling does not work for VMs behind Standard Load Balancer.
Outbound connectivity
Outbound connectivity from VMs is listed below. These are required when replicating VMs with Azure Site Recovery.
Storage | *.blob.core.windows.net |
Azure Active Directory | login.microsoftonline.com |
Replication | *.hypervrecoverymanager.windowsazure.com |
Service Bus | *.servicebus.windows.net |
This is mentioned in the following document.
Troubleshoot Azure-to-Azure VM network connectivity issues
https://docs.microsoft.com/azure/site-recovery/azure-to-azure-troubleshoot-network-connectivity
Solutions
We have the following options to establish outbound connectivity required for replicating VMs with Azure Site Recovery.
- Replace Standard Load Balancer with Basic Load Balancer.
- Assign public IPs to VMs behind Standard Load Balancer.
- Assign NAT Gateway to subnet where VMs connect.
- Add Public Load Balancer and configure outbound rule from VMs.
- Add Azure Firewall, configure UDR (User defined route) to route 0.0.0.0/0 to Azure Firewall, and set UDR to the subnet where VMs connect.
- Use Service Endpoint and Private Endpoint to open routes to required services.
1. Replace Standard Load Balancer with Basic Load Balancer.
Basic Load Balancer permits VMs behind load balancer to connect outside VNet, while Standard Load Balancer doesn’t.
Azure Load Balancer SKUs
https://docs.microsoft.com/azure/load-balancer/skus
When forced tunneling is enabled, replication traffic leaves the Azure boundary (i.e. is gone to the Internet). As the following document says, this configuration is not recommended. Iit is okay if forced tunneling is disabled.
Forced tunneling
https://docs.microsoft.com/azure/site-recovery/azure-to-azure-about-networking#forced-tunneling
2. Assign public IPs to VMs behind Standard Load Balancer.
Public IPs are assigned to both VMs to access directly outside VNet.
This solution means not only outbound traffic from VMs goes but also inbound traffic to VMs from outside VNet comes. So, the following configuration is mandatory.
- NSG (Network Security Group) should be configured to manage inbound/outbound traffic.
- It is simpler to assign NSG to subnet where VMs connect than to assign NSG to each NIC of VM.
If choosing Microsoft network routing, all traffic between VMs and Azure Services does not leave Azure boundary.
3. Assign NAT Gateway to subnet where VMs connect.
Instead of assigning public IP addresses to VMs, NAT gateway is assigned to the subnet where VMs connect.
NAT gateway works for outbound access and inbound traffic cannot use public IP address(es) assigned to NAT gateway. So, NAT gateway prevents VMs to being accessed from outside VNet.
If choosing Microsoft network routing, all traffic between VMs and Azure Services does not leave Azure boundary.
Virtual Network NAT Documentation
https://docs.microsoft.com/azure/virtual-network/nat-gateway/
4. Add Public Load Balancer and configure outbound rule from VMs.
Public Load Balancer and outbound rule allow us to configure to permit outbound traffic from VMs behind the load balancer.
This solution is similar to the 2nd and 3rd solution, but this is the most expensive than the 2nd and the 3rd. If choosing Microsoft network routing, all traffic between VMs and Azure Services does not leave Azure boundary.
5. Add Azure Firewall, configure UDR (User defined route) to route 0.0.0.0/0 to Azure Firewall, and set UDR to the subnet where VMs connect.
Azure Firewall allows us to managed inbound/outbound traffic from/to VMs. And default route of the subnet where VMs connect is changed to Azure Firewall with UDR (User Defined Route).
6. Use Service Endpoint and Private Endpoint to open routes to required services.
Instead of assigning public IP address(es) to either VMs or the subnet, routes to services required for ASR replication are opened with Service Endpoint and Private Endpoint.
Services required for ASR replication and what option(s) are acceptable are listed below.
- Azure Active Directory: Service Endpoint (if explicitly access to Microsoft 365 should be permitted, NAT Gateway is the best solution.)
- Service Bus : Service Endpoint only (As destination is not clear, Service Endpoint is the only option.)
- Storage Service: Either Service Endpoint or Private Endpoint
- Recovery Service Container: Private Endpoint Only
This solution is ideal thanks to the following reasons.
- All traffic does not leave Azure boundary.
- Public IP addresses for NAT Gateway are required.
- Cost effective.
Note the following points when configuring this solution.
- Depending upon storage account SKU (premium or standard) used for cache storage, storage account roles to be granted to managed identity of Recovery Service Container varies.
Storage SKU | Roles to be granted |
Standard |
Contributor Storage BLOB Data Contributor |
Premium |
Contributor Storage BLOB Data Owner |
- In the URL above, configuring private endpoint to cache storage is optional. In this case, however, we have to configure Private Endpoint or Service Endpoint to cache storage as VMs are behind Standard Load Balancer.
Summary and customer decision
We have several options to solve this situation and each option has pros/cons. After explaining these options to the customer, they made a decision to choose option #6.
Does traffic leave Azure boundary even though choosing Microsoft network routing? | Is public IP needed? | Cost | Configuration points | Remarks | |
1 | Yes in some cases. | No | Outgoing traffic cost might increase. | On-premise firewall rules | In case of using forced tunneling, storage replication traffic goes to the Internet. |
2 | No | Yes | NSG (Inbound/Outbound) |
||
3 | No | Yes | NSG (Especially outbound) |
||
4 | No | Yes | Public Load Balancer (Outbound rule) |
||
5 | No | Yes | Azure Firewall is expensive. |
UDR Firewall rules |
|
6 | No | Yes if NAT Gateway is used. | Grant role to managed identity of recovery container. |