Microsoft

Feb 18, 2021

SAP Landscape sizing and volume consolidation with ANF

This Blog does not have the claim to be all-embracing and should not be seen as single source of truth. I only would like to open a much broader sizing discussion and present a different view on this topic. The second part will try to explain how volume consolidation works.

Sizing an SAP Landscape is usually a very difficult task because there are so many different, sometimes unknown, parameters and values to be taken into consideration.

Most of the sizing tools only look towards a single system. This is surely okay for the VM (CPU) sizing however, when it comes to an optimized storage design most tools are not seeing the to avoid complete SAP Landscape and this obviously are not optimized for the best TCO for the customer.

Even when a storage design looks more expensive in the first view, it can be the basis of a much better TCO when all IT costs are taken into consideration. Especially storage changes and optimalizations are usually very complex tasks which sometimes even require longer system downtimes. To avoid unnecessary outages the SAP landscape, need to have a very flexible storage environment which allows the customer to grow and react on changes or different requirements from the application very quickly.

All this together guarantees optimized TCO and a smooth and reliable SAP landscape for our customers. Most of the effort and cost is going in the landscape management and administration. Only 25% of the overall costs are going into the Infrastructure investment.

Source: https://flylib.com/books/en/4.91.1.14/1/

No support of NVA in the Data Path!!!

Because of performance and latency reasons it is not supported to configure an Network Virtual Appliance in the data path of the SAP App server to the DB nor from the DB server to the ANF.

This is also stated in SAP note: https://launchpad.support.sap.com/#/notes/2731110

Performance tuning

To optimize an SAP Landscape, it is essential to monitor the used capacity (CPU, Network and Storage) continuously and evaluate the business needs with this outcome to be able to align and optimize quickly to meet the business requirements. The IT must catch up with the business not the other way around. It is a continuous process … Monitor ->Evaluate ->Adjust -> Monitor…..

Storage landscape sizing based on Azure NetApp Files (ANF)

Before this new feature was introduced there was a fixed performance ratio for the volume, depending on the Capacity pool QoS we got 16, 64 or 128 MB/s per Terabyte volume size. After implementing the new “Manual QoS Capacity Pool” (public pre-view) feature for ANF the storage sizing is much more optimized than with the previous fixed ratio between performance and volume size. This new feature allows to optimize the volume throughput per volume. Now, also small volumes can benefit from a higher throughput which helps to optimize the overall design. The challenge now is to find a good mix of “slow” and “fast” volumes in the Manual QoS Capacity Pool. This challenge gets much easier by larger Capacity Pools. I will give some sizing examples where I can demonstrate how easy it is when we focus on a landscape - and not focus on a single system design.

ANF Storage QoS classes

In ANF we have three different storage classes available. Note that this performance OoS pool setting is only there to manage the capacity of the storage system. The data will always be written to the same ANF backend.

We differentiate between:

Capacity QoS	Performance per terabyte volume size
Standard	16MB/s
Premium	64MB/s
Ultra	128MB/s

Of course, different costs are associated to the different QoS classes.

The cost calculator is available under: https://azure.microsoft.com/en-us/pricing/details/netapp/

Storage sizing … different approach

To find the optimal Capacity Pool size we need to calculate the individual storage requirements. And then integrate this individual number into the Capacity Pool calculation. This will present then the “Big Picture”.

The big benefit of ANF here is that nothing is written in stone, size and performance can be adopted dynamically during normal operation. Changing size or performance quota does not need any downtime for the customer.

System RAM vers. HANA DB size….It is essential to understand that the system RAM cannot be taken as the DB size of HANA. There is space in RAM required for OS operations, HANA delta tables, HANA temporary tables and so on. So the actual DB in size in memory if about 50% of the RAM. This is also the “golden sizing rule” for SAP HANA.

Source: SAP TDIv5 Memory guide

To calculate the required storage for the system this sizing table should help to calculate the overall storage requirements. Important is not the single data-, log, shared or backup volume, only the total value is important here.

For larger VM’s (4TB) those Values are very rough estimations. The backup concept has massive impact on the size of the backup volume.

Shared Volume: Shared between group of systems, like DEV, QAS and PRD. Shared Volume also contains /usr/sap.
Backup Volume: Shared between all instances. (1x DB Data+ 2x Log Volume + X)
SnapShot reserve is already calculated in the data-volume – RAM size not equal DB size

VM Main Memory	Data Vol (GB)	Log Vol (GB)	Shared (GB)	Backup (GB)	Total (GB)
256GB	300	300	300	900	1800
512GB	500	300	500	1100	2400
1024GB	1000	500	1000	2000	4500
2048GB	2000	500	2000	3000	7500
4096GB	4000	500	4000	5000	13500
6192GB	6000	500	6000	7000	19500

Table 1 – overall storage requirement

Volume Performance and sizing

As a basis for a start design, we estimate a performance quota for DEV, Sandbox, QAS and PRD systems (we stick to the SAP HANA storage KPI’s only for PRD systems). If customers also using their QAS systems for performance testing with the same dataset, it would make sense to design also the QAS storage performance accordingly. Surly this can/must be adopted dynamically if required differently from the customer.
SAP is providing KPI’s only for data- and log-volumes. Those KPI’s are equal over all sizes of the databases, this of course cannot be adopted as general guidance for productive environments. Here we know that the throughput is varying heavily from the KPI’s for larger systems. The DB startup time is depending on the data-volume read e.g.:

1 TB DB with 400 MB/s read startup time +/- 20 Min
2 TB DB with 400 MB/s read startup time +/- 40 Min
6 TB DB with 400 MB/s read startup time +/- 1 h 20 Min ---at least here we see a discrepancy of min. KPI

These MINIMAL requirements are meant for PRD systems. In general, it is 250 MB/s write for the log-volume and 400 MB/s for the data-volume.

It is important to understand that we need to provide more throughput for larger systems. As mentioned, … this is a starting point.

System Type 256GB + 512GB RAM	% of KPI	Data-Volume	Log-Volume
Sandbox	25%	100MB/s	50MB/s
DEV	25%	100MB/s	50MB/s
QAS	50%	200MB/s	125MB/s
PRD	100%	400MB/s	250MB/s

Table 2 – throughput per volume

System Type 1024GB RAM	% of KPI	Data-Volume	Log-Volume
Sandbox	25%	100MB/s	50MB/s
DEV	25%	150MB/s	75MB/s
QAS	50%	250MB/s	125MB/s
PRD	100%	500MB/s	250MB/s

Table 3 – throughput per volume

Startup time DB size 0.5TB = +/- 15Min

System Type 2048GB RAM	% of KPI	Data-Volume	Log-Volume
Sandbox	25%	150MB/s	100MB/s
DEV	25%	150MB/s	100MB/s
QAS	50%	300MB/s	150MB/s
PRD	100%	600MB/s	300MB/s

Table 4 – throughput per volume

Startup time DB size 1.2TB = +/- 30 Min

System Type 4TB + 6TB RAM	% of KPI	Data-Volume	Log-Volume
Sandbox	25%	200MB/s	100MB/s
DEV	25%	200MB/s	100MB/s
QAS	50%	400MB/s	200MB/s
PRD	100%	800MB/s	400MB/s

Table 5 – throughput per volume
Startup time DB size 3TB = +/- 60Min

Sizing a Landscape with 10 Systems

So how can a ANF storage design for 10 HANA databases looks like?

Let’s assume we have 4x 256GB (M32ls) DEV, 4x 1TB (M64s) QAS and 2x 1 TB (M64s) PRD systems

System type	Storage Requirements (table 1)	Performance Requirements (from table 2-5)
DEV	4x 1800GB = 7TB	Data=4x 100MB/s ; Log=4x 50MB/s
QAS	4x 4500GB = 18TB	Data=4x 250MB/s ; Log=4x 150MB/s
PRD	2x 4500GB = 9TB	Data=2x 500MB/s ; Log=2x 300MB/s
Backup		DEV=100MB/s, QAS=200MB/s,PRD=500MB/s
Shared		DEV=50MB/s, QAS=50MB/s, PRD=100MB/s

Total Storage= 34TB (from Table 1)
Total MB/s=(data and log) 3.800MB/s + (Backup) 800MB/s estimate + (Shared) 250MB/s estimated= 4800MB/s

Translated to an ANF Capacity Pool

Premium 35x 64MB/s = 2240MB/s (we make Backup a bit smaller that it fits into the calculation. Can grow on demand - but it will take some time)
Ultra 20x 128MB/s = 2560MB/s
Total 4800MB/s

Cost estimation

https://azure.microsoft.com/us-en/pricing/calculator/?service=netapp

Mix of Premium and Ultra 35TB x 64MB/s = 2240MB/s + 20TB x 128MB/s =2560 à 4800MB/s

Only Ultra (34TB) 38TB x 128MB/s = 4864MB/s

Conclusion:

In this case it is much more efficient to choose Ultra over Premium. Only very little overprovisioning and very easy deployment because everything is in one pool. The volumes surely will be distributed over several controller.

Consolidation of ANF Volumes

Consolidating HANA Log-Volumes

One additional option is to share log-Volumes. Log volumes are not a point of interest for the backup scenarios since there are open files (log files from the database) which cannot be backed up properly. Not all databases are creating the same amount of log information nor requiring the same throughput on the log-volume. Thus it can be very beneficial to share the log-volume among some database systems to benefit from much higher performance for the group of databases which are writing into this shared log-volume.

1 Prod System => 250 MB/s
2 Prod Systems=> +125 MB/s (+50%) = 375 MB/s
3 Prod Systems => +125 MB/s = 500 MB/s
4 Prod Systems => +125 MB/s = 625 MB/s
5 Prod Systems => +125 MB/s = 750 MB/s

How to create a consolidated structure with ANF

The main reason of doing consolidation is to achieve more with less. Less administration, less resources but more performance and easier administration. Since the performance of ANF is related to the size of the volume it is essential to create a large volume to benefit from this performance quota. Create a meaningful directory structure in this volume to have a good overview of the installed SAP systems. From the application node view this structure is basically invisible.

Understanding how ANF is structured and how it needs to be configured

Before you can create volumes in ANF for your SAP environment you need to create a NetApp account then a capacity pool and last the volume in the capacity pool.

More Info:https://docs.microsoft.com/en-gb/azure/azure-netapp-files/azure-netapp-files-understand-storage-hierarchy#capacity_pools

Design concept of multiple SAP systems in one ANF Volume

The basic idea behind this is to gain performance and lower the administration overhead of single volumes or default Azure storage.

Customers tend to separate non-prod and prod environments, this surely can be done here as well.

But instead of managing, sometimes tens of volumes, you only need to manage two or maybe three volumes.

This example shows only two SAP systems but surly this can be applied to a very large scale.

Create a single volume, here for example non-prod and create for every SAP simple directories in the main volume. For the SAP those ´nested´ mount points are completely invisible.

Deployment of ANF with multiple instances

Data volumes: If you plan to use SnapShot based backups and cloning, shared data volumes are not a good idea. If you use shared volumes a SnapRevert is not supported anymore because you would also overwrite the data of the other, shared, SAP instances. For all other volumes volume consolidation is always a good idea.

If it is required to restore a single file form one instance there is always the chance to go into the (hidden) snapshot directory and copy a single file out of this snapshot into its original location.

A shared landscape can look like this:

Here you see two DEV, two QAS and two PRD systems in an optimized volume deployment.

Another interesting idea is to consolidate volumes by SID. In this case you benefit from the fact that a snapshot would take all three areas (data, log and shared) together. And also, the performance is shared among the three areas. There is some more work to do here before you can clone/refresh the HANA database in this approach.