Overview
High availability of SAP NetWeaver systems on Windows can be achieved by using Windows Failover Clustering. The resources configured in the cluster are considered highly available, if the nodes that host the resources are up. But the Windows cluster generally requires more than half of the nodes to be running to avoid a split-brain scenario. To achieve this objective, quorum is required to be set in the Windows Failover Cluster. A quorum is designed to prevent split-brains scenarios. There are different quorum options available for SAP NetWeaver on Azure VMs.
Setting up high availability of SAP workload on Azure protects applications from infrastructure maintenance or failure within a single Azure region. But it doesn’t provide protection from a widespread regional disaster. For Disaster Recovery (DR), protection of applications running on Azure VMs can be enabled by replicating components using Azure Site Recovery (ASR) to another Azure region. In this article, we will talk about achieving high availability configurations in the DR region using ASR, when SAP ASCS/ERS with SMB file shares running on Windows are configured in primary production region.
NOTE: The article only focuses on the SAP NetWeaver HA deployment with a file share cluster. If your system is setup using “cluster shared disk” configuration, this article is not applicable.
IMPORTANT NOTE:
- The example shown in this article is exercised on the below OS version, cluster share and quorum option -
- SAP ASCS/ERS OS version: Windows Server 2022 Datacenter
- Quorum: Cloud witness
- Cluster share: File share (SMB on Azure Files)
- Depending on the type of the underlying file share storage used for SAP workload, you have to adopt an appropriate method to replicate the storage data to the DR site. In this example, SMB on Azure Files is used and its DR setup can be achieved by configuring a separate SMB share on Azure Files in the DR region and copy/synchronize the data periodically using robocopy.
NOTE: In case you are using SMB on Azure NetApp Files (ANF), the DR setup can be achieved using cross-region replication. ANF is not available in all regions. Refer Azure Products by Region to see if ANF is available in your DR region. - Failover of other dependent services like DNS or Active directory is not covered in this article.
- To replicate VMs using ASR for DR purposes, review supported regions.
- SMB Azure File based shares are available in every Azure region, including all public and sovereign regions. Premium SMB file shares, which is recommended for SAP workload are available in a subset of regions.
- ASR doesn’t replicate Azure load balancer that is used as virtual IP for the SAP ASCS/ERS cluster configuration in source site. In the DR site, you need to create load balancer manually beforehand or at the time of the failover event.
- The cloud witness is configured using Azure blob storage, so you need to configure a separate storage account in the DR region beforehand or at the time of the failover event.
- If you configured your SAP system using Windows DFS-N to support flexible SAPMNT share creation for SMB-based file share, then some of the post-steps like editing the UNC path in multiple areas won’t be required. You can directly modify SMB endpoint in DFS management console. But this article covers SAP systems installed without DFS-N namespace for SMB storage volumes.
- The procedure described here has not been tested with different OS releases. So, it might need some additional works based on your implementations or with future OS releases. So, make sure you test and document the entire procedure thoroughly in your environment.
SAP ASCS/ERS with SMB on Azure Files Disaster Recovery architecture
In the figure below, an SAP ASCS/ERS with SMB on Azure Files high availability cluster is configured in the primary Azure region. The cluster uses a cloud witness as quorum option. To establish DR for the setup, Azure Site Recovery (ASR) is used to replicate the SAP ASCS and SAP ERS VMs across the sites. For the SMB file share, a separate SMB on Azure Files is created in the DR region and data is copied an synchronized periodically using robocopy.
NOTE: You can also leverage SMB on Azure NetApp Files (ANF) for a SAP ASCS/ERS cluster. But in this blog, details on SMB on ANF is not covered.
SAP ASCS/ERS with SMB on Azure Files DR Architecture
As described in the example, to achieve an HA setup in the DR site for SAP ASCS/ERS, we need to make sure that all components that are part of the solution are replicated.
Components |
DR setup |
SAP ASCS/ERS VMs |
Replicate VMs using Azure Site Recovery |
SMB on Azure Premium Files |
Create separate SMB on Azure Premium Files. Copy data using robocopy. |
Storage used for Cloud Witness |
Create a separate storage in the DR region. |
Load balancer used for cluster virtual IP. |
Create a separate load balancer in the DR region. |
Disaster Recovery (DR) site preparation
To achieve the similar highly available setup of SAP ASCS/ERS in the DR site, you need to make sure that all the components are replicated and available in the event of a failover.
Configure ASR for SAP ASCS/ERS and Application server VMs
- Deploy Resource Group, Virtual Network, Subnet and Recovery Service Vault into the secondary (DR) region. For more information on networking in Azure VM disaster recovery, refer to prepare networking for Azure VM disaster recovery.
- Follow the instructions in the Tutorial to set up disaster recovery for Azure VMs document to configure ASR for SAP ASCS and ERS VMs.
- On enabling Azure Site Recovery for a VM to setup DR, the OS and local data disk(s) that are attached to VMs get replicated to the DR site. During the replication, the VM's disk writes are sent to a cache storage account in the source region. Data is sent from there to the target (DR) region, and recovery points are generated from the data. When you fail over a VM during the DR process, a recovery point is used to restore the VM in the target (DR) region.
- After the VMs are replicated, the status will turn into “Protected” and the replication health will be “Healthy”.
Replicated VMs in Recovery Service Vault
Configure robocopy to replicate SMB on Azure files data to DR Region
Robocopy, short for “Robust File Copy”, is a command-line utility in Windows for copying files and directories from one location to another. It is available as standard feature in Windows sines Windows Server 2008. So, you do not have to install this tool separately on your Windows server.
- To copy/synchronize the data of SMB on Azure Premium Files to the DR region, you need to create a separate SMB on Azure Premium Files share in the DR region.
- Follow the same post steps (like Active directory integration, assign required access and roles) on the new SMB share on Azure Files as you did in Primary region.
- Before you can use Robocopy, you need to make sure the SMB share on Azure Files is accessible. The easiest way is to mount the share as local network drive to the Windows server/VM from where you are planning on using Robocopy.
NOTE: You can have a UNC path as your source and target location in the robocopy command. It is not necessary to have the SMB share on Azure Files mounted at all times on the server from where you are executing command. - You can execute Robocopy from any Windows server/VM, but make sure you can access both source and target SMB share on Azure Files from that server/VM.
- Execute the Robocopy command. This screenshot is just an example. You must make sure setting the appropriate copy options with the command as per your requirement.
Example: Robocopy command to replicate data between two SMB on Azure Files storage
NOTE: You can schedule a job to execute Robocopy command periodically to copy the content of the SMB share on Azure Files.
Configure the cloud witness for SAP ASCS/ERS in the DR Site
Tip: Based on your DR strategy, you can either execute this step when you are preparing your DR site like setting up ASR or you can execute at the time of the DR failover process.
- Create an Azure storage account on the DR site for the usage as a cloud witness.
Region
Storage account name
Primary
ts3clusteastn
Disaster Recovery
ts3drwestwitness
Configure Standard the Load Balancer for SAP ASCS/ERS in the DR Site
Tip: Based on your DR strategy, you can either execute this step when you are preparing your DR site like setting up ASR or you can execute at the time of the DR failover process.
Deploy an Azure standard load balancer on the DR site, similar to the one you have deployed in your primary region. If you are creating the load balancer beforehand on the DR site, you won’t be able to assign VMs to the backend pool since the VMs don't exist yet in the DR region. You would need to create the backend pool as empty pool. This allows you to define the load balancing rules. But you only can assign the VMs in the DR region to the backend pool, when the DR failover of the VMS through ASR has been executed. Also, keep following points in mind -
- Keep the probe port of the DR region load balancer the same as in the primary region.
- When VMs without public IP addresses are placed in the backend pool of the internal (no public IP address) Standard Azure load balancer, there will be no outbound internet connectivity from these VMs, unless additional configuration is performed to allow routing to public end points. For details on how to achieve outbound connectivity see public endpoint connectivity for Virtual Machines using Azure Standard Load Balancer in SAP high-ava....
Site
Frontend IP
Primary Region - ASCS
10.19.0.97
Primary Region - ERS
10.19.0.105
DR Region - ASCS
10.5.0.9
DR Region - ERS
10.5.0.10
Disaster Recovery (DR) failover event
[A] - Applicable to SAP ASCS Node, [B] - Applicable to SAP ERS Node, [C] - Applicable to SAP Dialog Nodes.
In case of a production DR failover event, the following procedure needs to be followed for the SAP ASCS/ERS and the SAP Application instances. If you are using different Azure services with your SAP system, you many need to adjust your procedure accordingly. The failover procedure described here assumes that the system that is running in primary region is not reachable or unavailable out of some reasons. Therefore, the DR failover process is initiated. After triggering the failover to the DR region, the system in primary region will remain down at all times.
- Perform the failover of SAP ASCS/ERS and all application instance VMs that are configured in ASR to the DR region. For more details on how to failover, refer Tutorial to fail over Azure VMs to a secondary region for disaster recovery with Azure Site Recovery document.
NOTE: Use of Azure Site Recovery for SAP databases isn’t recommended. For more details on the DR recommendation for databases, refer to SAP database servers DR guidelines. - After the failover is completed, the status of the replicated items in the recovery service vault would be like below -
Failover completion status in Recovery Service Vault - Update the IP address of VMs in DNS or in host files (if maintained). In this example, update the IP address for SAP ASCS/ERS, and all application servers. The ASCS/ERS server name registered in the Windows cluster is also maintained in DNS. So, you need to update the IP address of ASCS/ERS server name in DNS or in host files as well.
Component
Primary region IP address
DR region IP address
ASCS Instance
10.19.0.103
10.5.0.8
ERS Instance
10.19.0.104
10.5.0.5
Dialog Instance
10.19.0.106
10.5.0.7
ASCS Virtual Server Name (cluster resource)
10.19.0.97
10.5.0.9
ERS Virtual Server Name (cluster resource)
10.19.0.105
10.5.0.10
NOTE: The ASCS/ERS virtual Server Name IP address needs to be updated in DNS and in the cluster configuration. Check point 9 for information on how to update the Server Name resource IP address in the cluster. - [A] [B] Login to the SAP ASCS and SAP ERS server/VM and update the UNC path maintained in the symbolic link to the SMB share on Azure Files in the DR region (ts3smbdrwest.file.core.windows.net)
#Execute below command in PowerShell with Administrator New-Item -Type SymbolicLink -Path "F:\usr\sap\TS3\SYS" -Target "\\ts3smbdrwest.file.core.windows.net\sapmnt\TS3\SYS" -Force New-Item -Type SymbolicLink -Path "F:\usr\sap\TS3\ASCS00\data" -Target "\\ts3smbdrwest.file.core.windows.net\sapmnt\TS3\ASCS00\data" -Force New-Item -Type SymbolicLink -Path "F:\usr\sap\TS3\ASCS00\log" -Target "\\ts3smbdrwest.file.core.windows.net\sapmnt\TS3\ASCS00\log" -Force New-Item -Type SymbolicLink -Path "F:\usr\sap\TS3\ASCS00\sec" -Target "\\ts3smbdrwest.file.core.windows.net\sapmnt\TS3\ASCS00\sec" -Force
- Change the value of SAPGLOBALHOST and SAPTRANSHOST to the DR storage account endpoint (ts3smbdrwest.file.core.windows.net) in the DEFAULT, ASCS, ERS and all APP instance profiles.
# SAPGLOBALHOST and SAPTRANSHOST value updated to DR storage endpoint in DEFAULT and instances profile PS F:\usr\sap\TS3\SYS\profile> findstr /i /r /c:"SAPGLOBALHOST" /c:"SAPTRANSHOST" TS3* DEFAULT.PFL TS3_ASCS00_ts3ascseast:SAPGLOBALHOST = ts3smbdrwest.file.core.windows.net TS3_D00_mswinapp55:SAPGLOBALHOST = ts3smbdrwest.file.core.windows.net TS3_ERS10_ts3erseast:SAPGLOBALHOST = ts3smbdrwest.file.core.windows.net DEFAULT.PFL:SAPGLOBALHOST = ts3smbdrwest.file.core.windows.net DEFAULT.PFL:SAPTRANSHOST = ts3smbdrwest.file.core.windows.net DEFAULT.PFL:DIR_TRANS = \\$(SAPTRANSHOST)\sapmnt\trans
- [A] [B] [C] SAP services (SAPSID_XX) configured in services.msc have an executable path defined with a parameter that points to the instance profile. The path to the profile is using the UNC path. This UNC path needs to be updated with the DR storage endpoint UNC path.
# Example to edit BINARY_PATH_NAME of SAPTS3_00 maintained in services.msc. # Primary region SMB on Azure files endpoint: ts3smbeast.file.core.windows.net. It need to be changed to secondary region SMB on Azure files endpoint: ts3smbdrwest.file.core.windows.net C:\Windows\system32>sc qc SAPTS3_00 SERVICE_NAME: SAPTS3_00 TYPE : 10 WIN32_OWN_PROCESS START_TYPE : 3 DEMAND_START ERROR_CONTROL : 1 NORMAL BINARY_PATH_NAME : "F:\usr\sap\TS3\ASCS00\exe\sapstartsrv.exe" pf="\\ts3smbeast.file.core.windows.net\sapmnt\TS3\SYS\profile\TS3_ASCS00_ts3ascseast" LOAD_ORDER_GROUP : TAG : 0 DISPLAY_NAME : SAPTS3_00 DEPENDENCIES : RPCSS : LanmanServer SERVICE_START_NAME : SAPCONTOSO\SAPServiceTS3 # Command to change BINARY_PATH_NAME of ASCS, ERS, PAS service in respective server where it is created sc config SAPTS3_00 binPath= "\"F:\usr\sap\TS3\ASCS00\exe\sapstartsrv.exe\" pf=\"\\ts3smbdrwest.file.core.windows.net\sapmnt\TS3\SYS\profile\TS3_ASCS00_ts3ascseast\" sc config SAPTS3_10 binPath= "\"F:\usr\sap\TS3\ERS10\exe\sapstartsrv.exe\" pf=\"\\ts3smbdrwest.file.core.windows.net\sapmnt\TS3\SYS\profile\TS3_ERS10_ts3erseast\" sc config SAPTS3_00 binPath= "\"F:\usr\sap\TS3\D00\exe\sapstartsrv.exe\" pf=\"\\ts3smbdrwest.file.core.windows.net\sapmnt\TS3\SYS\profile\TS3_D00_mswinapp55\" # Example: After changing BINARY_PATH_NAME C:\Windows\system32>sc qc SAPTS3_00 SERVICE_NAME: SAPTS3_00 TYPE : 10 WIN32_OWN_PROCESS START_TYPE : 3 DEMAND_START ERROR_CONTROL : 1 NORMAL BINARY_PATH_NAME : "F:\usr\sap\TS3\ASCS00\exe\sapstartsrv.exe" pf="\\ts3smbdrwest.file.core.windows.net\sapmnt\TS3\SYS\profile\TS3_ASCS00_ts3ascseast" LOAD_ORDER_GROUP : TAG : 0 DISPLAY_NAME : SAPTS3_00 DEPENDENCIES : RPCSS : LanmanServer SERVICE_START_NAME : SAPCONTOSO\SAPServiceTS3
- [A] [B] [C] Change the UNC path value maintained in the Path, RSEC_SSFS_DATAPATH, RSEC_SSFS_KEYPATH and SAPEXE environment variables to DR storage endpoint.
# Example showing the value of Path, RSEC* and SAPEXE env variable updated with DR storage endpoint PS C:\Windows\system32> Get-ChildItem -Path Env: | Where-Object -Property Value -CLike "*ts3smbdrwest*" Name Value ---- ----- Path C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\Windows\System32\OpenSSH\;C:\Program Files\dotnet\;C:\Program Files (x86)\dotnet\;C:\Users\ts3adm\AppData\Local\Microsoft\WindowsApps;\\ts3smbdrwest.file.core.windows.net\sapmnt\T... RSEC_SSFS_DATAPATH \\ts3smbdrwest.file.core.windows.net\sapmnt\TS3\SYS\global\security\rsecssfs\data RSEC_SSFS_KEYPATH \\ts3smbdrwest.file.core.windows.net\sapmnt\TS3\SYS\global\security\rsecssfs\key SAPEXE \\ts3smbdrwest.file.core.windows.net\sapmnt\TS3\SYS\exe\uc\NTAMD64
- If you have created an Azure standard load balancer in the DR region beforehand with an empty backend pool. Add the ASCS/ERS VMs into that backend pool.
Backend pool configuration in DR load balancer - [A] Update the IP address for ASCS and ERS server name resource configured in cluster to the Frontend IP configured in the load balancer (the one provisioned in disaster recovery region).
ASCS server name - IP change in clusterERS server name - IP change in cluster - [A] Change quorum to the cloud witness storage account created on disaster recovery region.
Update cloud witness by changing storage name in quorum settingsUpdate cloud witness with DR storage account name and secret key - [A] Start ASCS and ERS cluster role in the failover cluster manager.
Start ASCS and ERS cluster role - [C] Update the user store in all dialog instance servers with the correct database hostname that is running on DR region. Check SAP Note 1852017 to get more insights on how to update the hdbuserstore on Windows.
- [C] Start all dialog instances.
Status of all instances
Failback to the former primary region
Once the services in the primary region are available gain, and you have scheduled the failback of your production landscape back to the primary region, you need to:
- Re-protect failed over Azure VMs to the primary region. Refer to this document for more details.
- Schedule or run the Robocopy command to copy/synchronize data from the SMB share on Azure Files in the Disaster Region to the Primary Region.
- On the event of a failure, follow the same post steps described above.