Microsoft Mission Critical Blog

7 MIN READ

MSL correction from clone to multistate HANA DB Cluster SUSE activation

AnuradhaKarnam

Microsoft

Oct 28, 2025

Introduction:

SAP HANA system replication involves configuring one primary node and at least one secondary node. Any changes made to the data on the primary node are replicated to the secondary node synchronously. This ensures that we have a consistent and up-to-date backup, which is crucial for maintaining the integrity and availability of our data.

Problem Description:

Azure VM was in a degraded state causing a major outage since the SAP cluster was unable to start. Node health score (-1000000) did not reset automatically after redeploying and remained until manual intervention.

Consider below execution if your cluster nodes are running on SLES 12 or later: Please note that promotable is not supported.

Replace <placeholders> with your instance number and HANA system ID.

sudo crm configure primitive rsc_SAPHana_<HANA SID>HDB<instance number> ocf:suse:SAPHana
operations $id="rsc_sap<HANA SID>_HDB<instance number>-operations"
op start interval="0" timeout="3600"
op stop interval="0" timeout="3600"
op promote interval="0" timeout="3600"
op monitor interval="60" role="Master" timeout="700"
op monitor interval="61" role="Slave" timeout="700"
params SID="<HANA SID>" InstanceNumber="<instance number>" PREFER_SITE_TAKEOVER="true"
DUPLICATE_PRIMARY_TIMEOUT="7200" AUTOMATED_REGISTER="false"

sudo crm configure ms msl_SAPHana_<HANA SID>HDB<instance number> rsc_SAPHana<HANA SID>_HDB<instance number>
meta notify="true" clone-max="2" clone-node-max="1"
target-role="Started" interleave="true"

sudo crm resource meta msl_SAPHana_<HANA SID>_HDB<instance number> set priority 100

Cutover steps: These steps encompass pre-steps, execution steps, post-validation steps, and the rollback plan.

First, we have the pre-steps, which involve preparations and checks that need to be completed before we proceed with the main execution. This ensures that everything is in order and ready for the next phase. Next, we move on to the execution steps. These are the core actions that need to be carried out to ensure the task is completed accurately and efficiently. It's crucial that we follow these steps meticulously to avoid any issues. Post-validation steps come after the execution. This phase involves verifying the results and ensuring that everything works as expected.

Pre-Steps:

Check cluster status:

crm status
crm configure show
SAPHanaSR-showAttr

Ensure no pending operations or failed resources:

crm_mon -1

Confirm replication is healthy:

hdbnsutil -sr_state
SAPHanaSR-showAttr

Backup current configuration:

crm configure show > /root/cluster_config_backup.txt

Execution Steps:

Enable maintenance mode:

sudo crm configure property maintenance-mode=true

Delete the incorrect clone resource:

crm configure delete msl_SAPHana_<SID>_HDB<instance>

Recreate using ms primitive:

sudo crm configure ms msl_SAPHana_<SID>_HDB<instance> rsc_SAPHana_<SID>_HDB<instance> meta notify="true" clone-max="2" clone-node-max="1" target-role="Started" interleave="true" maintenance="true"
sudo crm resource meta msl_SAPHana_<HANA SID>_HDB<instance number> set priority 100

Disable maintenance mode:

crm configure property maintenance-mode=false

Refresh resource and disable maintenance:

sudo crm resource refresh msl_SAPHana_<SID>
wait 10 seconds
Check HSR status match in all SAPHanaSR-showAttr and crm_mon -A -1 and hdbnsutil -sr_state
sudo crm resource maintenance msl_SAPHana_<SID> off

Post Validation steps:

crm status
crm configure show
SAPHanaSR-showAttr

Rollback Plan:

Enable maintenance mode:

crm configure property maintenance-mode=true
sudo crm resource maintenance msl_SAPHana_<SID> on

Restore configuration from backup:

"crm configure load update /root/cluster_config_backup.txt"

Recreate the previous clone configuration if needed:

crm configure clone msl_SAPHana_<SID>_HDB<instance> rsc_SAPHana_<SID>_HDB<instance> \ meta notify=true clone-max=2 clone-node-max=1 target-role=Started interleave=true promotable=true

Disable maintenance and refresh resources:

crm configure property maintenance-mode=false
sudo crm resource refresh msl_SAPHana_<SID>
wait 10 seconds
sudo crm resource maintenance msl_SAPHana_<SID> off

Perform below steps during actual execution:

Task Description	Team
Pre Step: Submit a CAB request for approval	Basis
Perform Pre-checks
· Check cluster status: SBD,pacemaker, coro services, sbd messages, isscsi, constraint crm status crm configure show SAPHanaSR-showAttr · Ensure no pending operations or failed resources: crm_mon -R1 -Af -1 · Confirm replication is healthy: hdbnsutil -sr_state · Backup current configuration: Pre-change crm configure show > /hana/shared/SID/dbcluster_backup_prechange.txt crm configure show \| sed -n '/primitive rsc_SAPHana_SID_HD/,/^$/p' crm configure show \| sed -n '/clone msl_SAPHana_SID_HD/,/^$/p'	Basis
Execution
Get Go ahead from Leadership team	Basis
Step 0 – Put cluster into maintenance mode	Basis
crm resource maintenance g_ip_SID_HD on	Basis
#Backup current configuration: When cluster, msl, g_ip is in maintenance crm configure show > /hana/shared/SID/dbcluster_backup_prehealth.txt	Basis
Step 1 – (If not already done) clear Node 1 health and ensure topology/azure-events are running on both nodes (this avoids scheduler surprises when we re-manage)	Basis
*#Execute on m1vms(Ideally it can be executed on any node) crm_attribute -N vm -n '#health-azure' -v 0 crm_attribute --node vm --delete --name "azure-events-az_curNodeState" crm_attribute --node vm--delete --name "azure-events-az_pendingEventIDs"	SOPS
crm resource cleanup health-azure-events-cln crm resource cleanup cln_SAPHanaTopology_SID_HD	Basis
#Backup current configuration: When health correct is complete and msl correction remaining. crm configure show > /hana/shared/SID/dbcluster_backup_premsl.txt	Basis
Step 2 – Convert the wrapper inside a single atomic transaction We delete the promotable clone wrapper only (not the primitive), then create the ms wrapper with the same name msl_SAPHana_SID_HD so existing colocation/order constraints that reference the name keep working.	Basis
# Remove the promotable clone wrapper (keeps rscSAPHanaSIDHD primitive intact) crm configure delete msl_SAPHana_SID_HD	Basis
# Recreate as multi-state (ms) for classic agents sudo crm configure ms msl_SAPHana_SID_HD rsc_SAPHana_SID_HD meta notify="true" clone-max="2" clone-node-max="1" target-role="Started" interleave="true" maintenance="true"	Basis
sudo crm resource meta msl_SAPHana_SID_HD set priority 100	Basis
Step 3 – Re‑enable cluster management of IP and HANA	Basis
Prechecks by MSFT, SUSE Teams	MSFT/SUSE
Precheck by BASIS Team	Basis
crm configure property maintenance-mode=false crm resource refresh msl_SAPHana_SID_HD wait 10 seconds crm resource maintenance msl_SAPHana_SID_HD off crm resource maintenance g_ip_SID_HD off	Basis
Validation	Basis
crm_mon -R1 -Af -1 crm status crm configure show SAPHanaSR-showAttr	Basis
Rollback Plan
Enable maintenance mode:	Basis
crm configure property maintenance-mode=true crm resource maintenance msl_SAPHana_SID_HD on crm resource maintenance g_ip_SID_HD on	Basis
Restore configuration from backup: Decide to which state we need to revert and use respective backup	Basis
crm configure load update /hana/shared/SID/dbcluster_backup_prechange/prehealth/premsl.txt	Basis
Recreate the previous clone configuration if needed:	Basis
crm configure clone msl_SAPHana_SID_HD rsc_SAPHana_SID_HD meta notify=true clone-max=2 clone-node-max=1 target-role=Started interleave=true promotable=true maintenance="true"	Basis
Disable maintenance and refresh resources:	Basis
crm configure property maintenance-mode=false crm resource refresh msl_SAPHana_SID_HD wait 10 seconds crm resource maintenance msl_SAPHana_SID_HD off crm resource maintenance g_ip_SID_HD off	Basis

Important Points:

1. Are there known version-specific considerations when migrating from clone to ms?

If you are using SAPHanaSR, please ensure you are using 'ms'. On the other hand, if you are working with SAPHanaSR-angi, you should use 'clone'.

There are 3 different sets of HANA resource agents and SRHook scripts, two older ones and one newer one.

2. Does this change apply across the board on SUSE OS and/or Pacemaker versions?

The packages for the older ones are:

SAPHanaSR which is for Scale-Up HANA clusters.

SAPHanaSR-ScaleOut which is for Scale-Out HANA clusters.

The package for the new one is:

SAPHanaSR-angi which is for both Scale-up and Scale-out clusters. (angi stands for "advanced next generation interface").

When using the older SAPHanaSR or SAPHanaSR-ScaleOut resource agents and SRHook scripts, SUSE only supports the multi-state (ms) clone type for the SAPHanaSR (scale-up) or SAPHanaController (scale-out) resource. The older resource agents and scripts are supported on all Service Packs of SLES for SAP 12 and 15.

When using the newer SAPHanaSR-angi resource agents and scripts, SUSE only supports the regular clone type for the SAPHanaController resource (scale-up AND scale-out) with the "promotable=true" meta-attribute set on the clone. The newer "angi" resource agents and scripts are supported on SLES for SAP 15 SP5 and higher and on SLES for SAP 16 when it is released later this year.

So, with SLES for SAP 15 SP5 and higher, you can use either the older or the newer resource agents and scripts. For all Service Packs of SLES for SAP 12 and Service Packs of SLES for SAP 15 prior to SP5, you must use the older resource agents and scripts. Starting with SLES for SAP 16, you must use the new angi resource agents and scripts.

Installing the new SAPHanaSR-angi package will automatically uninstall the older SAPHanaSR or SAPHanaSR-ScaleOut packages if they are already installed. SUSE has published a blog on how to migrate from the older resource agents and scripts to the newer ones provided in the reference suse link.

Conclusion:

Let us set up and ensure that system replication is active. This is crucial to avoid any business disruptions during our critical operational hours. By taking these steps, we can seamlessly enhance the cluster architecture and resilience of our systems. Implementing these replication strategies will not only bolster our business continuity measures but also significantly improve our overall resilience. This means our operations will run more smoothly and efficiently, allowing us to handle future demands with ease.