SAP Multi Instance Clustering
Published Mar 13 2019 08:20 AM 610 Views
Microsoft
First published on MSDN on May 29, 2007

As most of folks who are busy in the SAP world know, SAP only had a very restricted amount of combinations of MSCS Cluster scenarios supported. The basic support policy of SAP was along the line:

·         As long as MSCS clustering was used for the database part only, SAP was fine with whatever configuration used. This could have been Active/Active Clusters, configurations with more than two cluster nodes, etc. For SAP software this all is transparent as long as SAP applications are not involved in the same MSCS cluster

·         However when you installed the SAP Central Instance (CI) in a MSCS cluster (whether with or without database), SAP only supported having one CI in the MSCS cluster. If the database like SQL Server got installed in the same cluster, the usual model was to run SQL Server on one node and the SAP CI on the other node. This latter configuration was described in detail in the SAP Installation documentation since many years.

·         SAP did not support having cluster configurations where non-CI instances got clustered.

·         SAP did not support  configurations of more than 2 nodes

·         SAP did not support configurations with multiple CIs on one MSCS configuration.

Especially the last configuration of having multiple CIs clustered in one two node MSCS setup came up in the request list time after time. Especially with commodity servers becoming more powerful or consolidation scenarios on high-end hardware, it seemed to be reasonable doing something like this. The reason why SAP did not want to support this was that the cluster disk resource from SAP was the share sapmnt.  However in Windows, every one of the CIs is getting installed under this path. Therefore the normal SAP Setup did fail trying to install a second clustered CI since sapmnt as a cluster resource already did exist.  Several customers tried to manually configure multiple instances of CIs within one SAP cluster group. However this again ended in the failover of all instances as soon as one of the instance initiated a failover.  One could argue that the failover mainly should protect from hardware failure and hence the usual failover case is hardware triggered and thus applies to all instances of the server. But there as well scenarios which came up in the past at customers where software failures on the CI could lead to a failover of a specific instance only. Nevertheless so far SAP did not want to support this scenario. However SAP left a door open for hardware vendors which were willing to support more sophisticated MSCS SAP configurations. One of the hardware vendors leveraging on this was HP offering a service called HP Competent Cluster Service (see OSS note #826119).

However above description is mostly a description of the past and does not apply to the Netweaver 2004S world anymore. Things changed dramatically on SAP side in terms of supporting what we call MultiSID clusters. The initial reason for this change was the introduction of the so called ‘Standalone Replicated Enqueue’. Initially developed for the SAP Java Stack, it became part of the ABAP stack very soon as well. The basic concepts are:

·         The message server process and an enqueue server process are running self contained without the shell of a complete ABAP or Java instance

·         The enqueue server process will replicate all enqueue data in a 2-Phase commit manner to a second node where an enqueue replication process will keep an enqueue table (memory table) on the most recent state.

·         For Windows this runs in a MSCS cluster environment. Hence when one node will go down, the cluster will failover to the other node, where the starting enqueue server process will use the enqueue table which so far was maintained by the enqueue replication server process. Means no enqueue entries are getting lost.

·         As soon as the failing node comes up again, it will become the replication target for the enqueue server process

Means the last single point of failure which was the potential loss of enqueue entries is solved with this solution. There could be pages written on how this exactly works, but we want to keep it short here. As one can imagine the pure CPU consumption by enqueue and message server are very little. With having only one Standalone Replicated Enqueue running one 2 nodes, you would look at rather wasted resources. Therefore SAP now allows an installation of multiple Standalone Replicated Enqueues on one MSCS configuration. The way which is described in an SAP installation Guide, allows working with separate cluster disk resources, so that everyone of the Standalone Instances can fail over independently. SAP also allows installing multiple Standalone Instances over more than two cluster nodes. This as well is described in the installation documentation.

The documentation can be found on the SAP Service Market Place under https://service.sap.com/instguidesNW70

From there go to ‘Installation’ on the left hand pane. In the appearing content of the main pane, extend the item ‘2 – Installation – SAP Netweaver System’.  There you’ll find the official installation documentation. As you can imagine, SAP also posted new installation DVDs on their SAP Service Marketplace which allow executing such installations.  The DVDs can be found in the usual download areas for Netweaver 2004S for Installations and Upgrade. Please look in the OS/DB specific areas.

One of the weaknesses of the Standalone Replicated Enqueue concepts as installed by default from SAP is the fact that the gateway process is not included in the failover as it was with the traditional CI. This could harm situations where external 3 rd party products need to register at a gateway and hence are requiring the gateway being available with the same name all the time. This can be changed as described in OSS note #1010990. The other question is how to get a system which you upgraded to Netweaver 2004S based applications from the old CI configuration to a new Standalone Replicated Enqueue Configuration. This step is described in SAP OSS note #1011190.

The SAP solution of Standalone Replicated Enqueue can be combined in an ideal way with SQL Server 2005 Database Mirroring. Database Mirroring will take care on the SQL Server side High-Availability whereas Standalone Replicated Enqueue with eventual Multi-SID clustering takes care on the SAP side. However before going full steam into this solution of concentrating all Standalone replicated Enqueue instances on just one MSCS cluster think about some aspects as:

·         You might want to run some systems on different types of OS releases or OS Service Packs. Means having production and test instances on the same cluster is not a good idea from the beginning.

·         You also might face extremely different maintenance windows with some of your systems. E.g. we have some customers where a downtime on the SAP SCM system on the weekend is an absolute NO, but the ERP could be taken down for a one or two hours from Saturday to Sunday night. Whereas the SCM system could be taken down between Tuesday and Thursday without any problem.  Means having the two Standalone Replicated Enqueue Instances of ERP and SCM on one MSCS cluster leaves you with hardly any maintenance window at all.

So think about which system to combine on one Cluster and where to try to use different cluster solutions. If you are concerned about wasting resources, SAP does allow running a non-clustered Dialog Instance on each of the cluster nodes which run a Standalone Replicated Enqueue Instance components. That also could solve the issue about not willing to waste resources.

What about running multiple instance of SQL server clustered in one instance? Possible as well for SAP application usage. However as in the SAP application case, please think about the combinations you want to configure. Also keep in mind that you need to control the resource consumption of the different SQL Server instances in a way that there enough resources available on the node which eventually has to carry the complete workload in case every other node is down.


Version history
Last update:
‎Mar 13 2019 08:20 AM
Updated by: