Implementing SAP HANA scale-up high availability on Disaster Region

Published Jun 10 2022 08:00 AM 1,683 Views
Microsoft

DISCLAIMER

 

This article describes procedure to prepare SAP HANA scale-up high availability in Disaster Region (DR) running on Red Hat Enterprise Linux on Azure. The procedure described in this article has been initially tested by Red Hat and Microsoft engineering and should be used as a basis to set up corresponding pilot implementations. As its current state, the solution is still unverified as mentioned in Red Hat article and needs to be considered experimental. It is highly recommended to engage Red Hat and SAP consulting service before implementing this solution in your organization. Carefully read the disclaimer from Red Hat about this procedure before you proceed with the setup on Azure.

 

Solution overview

 

In below figure, primary database hanadb1 in production region replicates data change synchronously to hanadb2 in the same region. Primary database on hanadb1 in production region also replicates data change asynchronously to hanadb3 in another region. Secondary node hanadb3 is a source system for a further secondary database on hanadb4 located in the same region with hanadb1. For more information on this system replication setup, see SAP HANA Multi-target System Replication | SAP Help Portal.

 

sap-hana-disaster-recovery-cluster.png

 

To configure above disaster recovery setup of SAP HANA scale-up with high availability cluster you need to configure two independent clusters, one in each region. In primary region, configure SAP HANA scale-up high availability between hanadb1 and hanadb2 as documented in High availability of SAP HANA on Azure VMs on RHEL or High availability of SAP HANA Scale-up with ANF on RHEL based on your storage type (local or NFS). Similarly on secondary or DR region, set up another independent SAP HANA scale-up cluster between hanadb3 and hanadb4 following the same document. You then configure SAP HANA multi-target system replication and cluster resource on DR region as described in below configuration steps section.

 

The cluster on the DR region is ready to run, but the services are stopped. The automatic handling of the cluster resources (like SAPHana_HN1_03-Clone, g_ip_HN1_03, hanadbx_nfs etc.) in DR region is configured but are placed in unmanaged mode. When the primary region goes down and failover to DR region is initiated, you need to manually takeover hanadb3 as the new primary. HANA system replication between hanadb3 and hanadb4 should be active before you start cluster service on the DR region. After starting cluster services, you can put the resources in managed mode.

 

Key points on the setup

 

The configuration of SAP HANA scale-up high availability cluster on DR region looks identical to primary region. But you should understand some key differences and take into account following points.

 

  • SAP HANA scale-up high availability setup on DR region looks identical to the sites in primary region except for the hostname.
  • To establish HANA system replication between primary and DR region, the communication between nodes from primary to DR region should be open, and vice-versa.
  • HANA system replication is established as described in overview section and should always be active across all sites.
  • No automatic failover feature from production region to DR region. Failover to DR region is manual.
  • After manual failover to DR region, all cluster functions are manually activated again.
  • Sites running on primary and DR region needs to be in sync in terms of changes and patch levels. For example, if you've changed the constraint in primary region cluster, same needs to be updated on DR region cluster.
  • The database connection on all clients should point to the virtual IP of the primary region. It needs to be changed after manual failover to DR. region 
  • Automatic start of cluster services on VM boot should never be stopped on the primary region. But you need to disable it on the DR region.

 

Configuration steps

 

Pre-requisite: Set up HANA clusters on primary and DR region

 

  • Configure two independent SAP HANA scale-up high availability clusters, one on each region as documented in High availability of SAP HANA on Azure VMs on RHEL.
  • Ensure SAP HANA System ID (SID), instance number are same on both region sites, except hostname.
  • If your HANA file systems are NFS mount, configure two independent SAP HANA scale-up high availability clusters, one on each region as documented in High availability of SAP HANA Scale-up with ANF on RHEL. Also maintain the NFS file system entry in /etc/fstab as well.

NOTE: In case of cluster managed mount, it is recommended to add file system entry into /etc/fstab using 'noauto' option. 'noauto' option avoids automatic mount of filesystem after reboot. So, mounts are either handled manually or by the cluster. 

 

Configure cluster resources in DR region

 

After configuring an independent SAP HANA scale-up high availability cluster on the DR region, following steps needs to be performed on the DR region cluster. These steps are applicable only in DR region.

 

  1. Disable automatic start of cluster service on VM boot in DR region cluster.
    pcs cluster disable --all​
  2. Put the resources in unmanaged mode in DR region cluster.
    # Place HANA resource in unmanage mode
    pcs resource unmanage SAPHana_HN1_03-Clone
    # Place virtual IP group (contains virtual IP and probe port) resource in unmanage mode
    pcs resource unmanage g_ip_HN1_03​
    If HANA file systems are on NFS mounts, put the filesystem resource on the DR-site into unmanaged state. This step is applicable only when NFS filesystems are used for HANA.
    # Place filesystem group resource in unmanage mode
    pcs resource unmanage hanadb3_nfs
    pcs resource unmanage hanadb4_nfs​
  3. Stop the cluster.
    # Stop the cluster after placing resources in unmanage mode
    pcs cluster stop --all​

NOTE: When the resources are in unmanaged state and the cluster is stopped, it is highly recommended to start the cluster on a regular basis on the secondary region to ensure that the cluster service comes up. As resources are already in unmanage state, it can remain in the same state after starting the cluster. You can stop the cluster after cluster comes up and services are running.

 

pcs status --full

 

Establish system replication from primary to DR region

 

Establish system replication from node hanadb1 in primary region to the node hanadb3 in DR region.

 

  1. Stop HANA database on hanadb3 and hanadb4
    # Execute command using <hanasid>adm
    sapcontrol -nr 03 -function StopSystem HDB​
  2. Copy keys from hanadb1 in primary region to hanadb3 and hanadb4 of DR region.
    # Copy keys from hanadb1 to hanadb3
    scp /usr/sap/HN1/SYS/global/security/rsecssfs/data/SSFS_HN1.DAT sidadm@hanadb3:/usr/sap/HN1/SYS/global/security/rsecssfs/data/
    scp /usr/sap/HN1/SYS/global/security/rsecssfs/key/SSFS_HN1.KEY sidadm@hanadb3:/usr/sap/HN1/SYS/global/security/rsecssfs/key/
    
    # Copy keys from hanadb1 to hanadb4
    scp /usr/sap/HN1/SYS/global/security/rsecssfs/data/SSFS_HN1.DAT sidadm@hanadb4:/usr/sap/HN1/SYS/global/security/rsecssfs/data/
    scp /usr/sap/HN1/SYS/global/security/rsecssfs/key/SSFS_HN1.KEY sidadm@hanadb4:/usr/sap/HN1/SYS/global/security/rsecssfs/key/​
  3. Register hanadb3 as secondary of hanadb1 in asynchronous replication mode. Log in as <hanasid>adm in hanadb3.
    hdbnsutil -sr_register --remoteHost=hanadb1 --remoteInstance=03 --replicationMode=async --operationMode=logreplay --name=SITE3
  4. Start HANA database on hanadb3.
    # Execute command using <hanasid>adm
    sapcontrol -nr 03 -function StartSystem HDB​
  5. Enable system replication on hanadb3.
    # Execute command using <hanasid>adm on hanadb3
    hdbnsutil -sr_enable --name=SITE3​
  6. Register hanadb4 as secondary of hanadb3.
    # Execute command using <hanasid>adm on hanadb4
    hdbnsutil -sr_register --remoteHost=hanadb3 --remoteInstance=03 --replicationMode=sync --operationMode=logreplay --name=SITE4​
  7. Start HANA database on hanadb4.
    # Execute command using <hanasid>adm
    sapcontrol -nr 03 -function StartSystem HDB​
  8. After establishing system replication between primary and DR region, check the system replication on hanadb1 in primary region.
    # Execute command using <hanasid>adm in primary node (hanadb1) in primary region
    python /usr/sap/HN1/HDB03/exe/python_support/systemReplicationStatus.py​

 

Failover to DR region

 

Primary region goes down and business has decided to perform failover to DR region. Follow below steps to takeover hanadb3 on DR region as the new primary.

 

  1. Perform a takeover on hanadb3.
    # Execute command using <hanasid>adm on hanadb3
    hdbnsutil -sr_takeover --suspendPrimary​
  2. Check the system replication status on hanadb3. HANA system replication between hanadb3 and hanadb4 should be active after the takeover.
    # Execute command using <hanasid>adm in new primary node (hanadb3) in disaster region
    python /usr/sap/HN1/HDB03/exe/python_support/systemReplicationStatus.py​
  3. Start the cluster in DR region.
    # Start the cluster in disaster region
    pcs cluster start --all​
  4. Check the status of the cluster. The resources should still be in unmanaged mode.
    pcs status --full​
  5. Clean up the resources and place the resources in manage mode.
    # If there are any failed resource after starting cluster, you need to cleanup the resource(s).
    pcs resource cleanup SAPHana_HN1_03-Clone
    
    # ONLY APPLICABLE - If you are using NFS mount for HANA file system
    pcs resource manage hanadb3_nfs
    pcs resource manage hanadb4_nfs
    
    # Place HANA and virtual group resource in manage mode
    pcs resource manage SAPHana_HN1_03-Clone
    pcs resource manage g_ip_HN1_03​
  6. Enable cluster service to start on VM boot.
    # Execute command in disaster region cluster i.e. hanadb3 or hanadb4.
    pcs cluster enable --all​

Important: After DR region becomes the new primary for HANA database, you need to change the database connection to all clients with new hostname.

 

Configure former primary as new secondary site

 

After failover to DR region, you want former primary to be your new secondary region. Follow below steps only on former primary region.

 

  1. Disable cluster service to start on VM boot.
    # Execute command in former primary node i.e. hanadb1
    pcs cluster disable --all​
  2. Put the resources in unmanaged mode.
    # Place HANA resource in unamange mode
    pcs resource unmanage SAPHana_HN1_03-Clone
    
    # Place virtual IP group (contains virtual IP and probe port) resource in unmanage mode
    pcs resource unmanage g_ip_HN1_03​
    If HANA file systems are on NFS mounts, put the filesystem resource on the DR-site into unmanaged state. This step is applicable only when NFS filesystems are used for HANA.
    # Place filesystem group resource in unmanage mode
    pcs resource unmanage hanadb1_nfs
    pcs resource unmanage hanadb2_nfs​
  3. Stop the cluster.
    # Stop the cluster after placing resources in unmanage mode
    pcs cluster stop --all​
  4. Stop HANA database on hanadb1 and hanadb2, if running.
    # Execute command using <hanasid>adm
    sapcontrol -nr 03 -function StartSystem HDB​
  5. Clean up SAP HANA replication setup on former primary.
    # Execute command using <hanasid>adm
    hdbnsutil -sr_cleanup --force​
  6. Copy keys from hanadb3 in new primary to hanadb1 and hanadb2.
    # Copy keys from hanadb3 to hanadb1
    scp /usr/sap/HN1/SYS/global/security/rsecssfs/data/SSFS_HN1.DAT sidadm@hanadb1:/usr/sap/HN1/SYS/global/security/rsecssfs/data/
    scp /usr/sap/HN1/SYS/global/security/rsecssfs/key/SSFS_HN1.KEY sidadm@hanadb1:/usr/sap/HN1/SYS/global/security/rsecssfs/key/
    
    # Copy keys from hanadb3 to hanadb2
    scp /usr/sap/HN1/SYS/global/security/rsecssfs/data/SSFS_HN1.DAT sidadm@hanadb2:/usr/sap/HN1/SYS/global/security/rsecssfs/data/
    scp /usr/sap/HN1/SYS/global/security/rsecssfs/key/SSFS_HN1.KEY sidadm@hanadb2:/usr/sap/HN1/SYS/global/security/rsecssfs/key/​
  7. Register hanadb1 as secondary of hanadb3 in asynchronous replication mode. Log in as <hanasid>adm in hanadb1.
    hdbnsutil -sr_register --remoteHost=hanadb3 --remoteInstance=03 --replicationMode=async --operationMode=logreplay --name=SITE1
  8. Start HANA database on hanadb1.
    # Execute command using <hanasid>adm
    sapcontrol -nr 03 -function StartSystem HDB​
  9. Enable system replication on hanadb1.
    # Execute command using <hanasid>adm on hanadb1
    hdbnsutil -sr_enable --name=SITE1​
  10. Register hanadb2 as secondary of hanadb1.
    # Execute command using <hanasid>adm on hanadb2
    hdbnsutil -sr_register --remoteHost=hanadb1 --remoteInstance=03 --replicationMode=sync --operationMode=logreplay --name=SITE2​
  11. Start HANA database on hanadb2.
    # Execute command using <hanasid>adm
    sapcontrol -nr 03 -function StartSystem HDB
  1. After establishing system replication between the new primary and former primary regions, check the system replication on hanadb3 in new primary region.
    # Execute command using <hanasid>adm in primary node (hanadb1) in primary region
    python /usr/sap/HN1/HDB03/exe/python_support/systemReplicationStatus.py​

NOTE: When the resources are in unmanaged state and the cluster is stopped, it is highly recommended to start the cluster on a regular basis on the secondary region to ensure that the cluster service comes up. As resources are already in unmanage state, it can remain in the same state after starting the cluster. You can stop the cluster after cluster comes up and services are running.

Co-Authors
Version history
Last update:
‎Jun 09 2022 12:01 PM
Updated by: