Feb 18 2021 12:21 AM
This Blog does not have the claim to be all-embracing and should not be seen as single source of truth. I only would like to open a much broader sizing discussion and present a different view on this topic. The second part will try to explain how volume consolidation works.
Sizing an SAP Landscape is usually a very difficult task because there are so many different, sometimes unknown, parameters and values to be taken into consideration.
Most of the sizing tools only look towards a single system. This is surely okay for the VM (CPU) sizing however, when it comes to an optimized storage design most tools are not seeing the to avoid complete SAP Landscape and this obviously are not optimized for the best TCO for the customer.
Even when a storage design looks more expensive in the first view, it can be the basis of a much better TCO when all IT costs are taken into consideration. Especially storage changes and optimalizations are usually very complex tasks which sometimes even require longer system downtimes. To avoid unnecessary outages the SAP landscape, need to have a very flexible storage environment which allows the customer to grow and react on changes or different requirements from the application very quickly.
All this together guarantees optimized TCO and a smooth and reliable SAP landscape for our customers. Most of the effort and cost is going in the landscape management and administration. Only 25% of the overall costs are going into the Infrastructure investment.
Source: https://flylib.com/books/en/4.91.1.14/1/
Because of performance and latency reasons it is not supported to configure an Network Virtual Appliance in the data path of the SAP App server to the DB nor from the DB server to the ANF.
This is also stated in SAP note: https://launchpad.support.sap.com/#/notes/2731110
To optimize an SAP Landscape, it is essential to monitor the used capacity (CPU, Network and Storage) continuously and evaluate the business needs with this outcome to be able to align and optimize quickly to meet the business requirements. The IT must catch up with the business not the other way around. It is a continuous process … Monitor ->Evaluate ->Adjust -> Monitor…..
Before this new feature was introduced there was a fixed performance ratio for the volume, depending on the Capacity pool QoS we got 16, 64 or 128 MB/s per Terabyte volume size. After implementing the new “Manual QoS Capacity Pool” (public pre-view) feature for ANF the storage sizing is much more optimized than with the previous fixed ratio between performance and volume size. This new feature allows to optimize the volume throughput per volume. Now, also small volumes can benefit from a higher throughput which helps to optimize the overall design. The challenge now is to find a good mix of “slow” and “fast” volumes in the Manual QoS Capacity Pool. This challenge gets much easier by larger Capacity Pools. I will give some sizing examples where I can demonstrate how easy it is when we focus on a landscape - and not focus on a single system design.
In ANF we have three different storage classes available. Note that this performance OoS pool setting is only there to manage the capacity of the storage system. The data will always be written to the same ANF backend.
We differentiate between:
Capacity QoS |
Performance per terabyte volume size |
Standard |
16MB/s |
Premium |
64MB/s |
Ultra |
128MB/s |
Of course, different costs are associated to the different QoS classes.
The cost calculator is available under: https://azure.microsoft.com/en-us/pricing/details/netapp/
To find the optimal Capacity Pool size we need to calculate the individual storage requirements. And then integrate this individual number into the Capacity Pool calculation. This will present then the “Big Picture”.
The big benefit of ANF here is that nothing is written in stone, size and performance can be adopted dynamically during normal operation. Changing size or performance quota does not need any downtime for the customer.
System RAM vers. HANA DB size….It is essential to understand that the system RAM cannot be taken as the DB size of HANA. There is space in RAM required for OS operations, HANA delta tables, HANA temporary tables and so on. So the actual DB in size in memory if about 50% of the RAM. This is also the “golden sizing rule” for SAP HANA.
Source: SAP TDIv5 Memory guide
To calculate the required storage for the system this sizing table should help to calculate the overall storage requirements. Important is not the single data-, log, shared or backup volume, only the total value is important here.
For larger VM’s (4TB) those Values are very rough estimations. The backup concept has massive impact on the size of the backup volume.
VM Main Memory |
Data Vol (GB) |
Log Vol (GB) |
Shared (GB) |
Backup (GB) |
Total (GB) |
256GB |
300 |
300 |
300 |
900 |
1800 |
512GB |
500 |
300 |
500 |
1100 |
2400 |
1024GB |
1000 |
500 |
1000 |
2000 |
4500 |
2048GB |
2000 |
500 |
2000 |
3000 |
7500 |
4096GB |
4000 |
500 |
4000 |
5000 |
13500 |
6192GB |
6000 |
500 |
6000 |
7000 |
19500 |
Table 1 – overall storage requirement
As a basis for a start design, we estimate a performance quota for DEV, Sandbox, QAS and PRD systems (we stick to the SAP HANA storage KPI’s only for PRD systems). If customers also using their QAS systems for performance testing with the same dataset, it would make sense to design also the QAS storage performance accordingly. Surly this can/must be adopted dynamically if required differently from the customer.
SAP is providing KPI’s only for data- and log-volumes. Those KPI’s are equal over all sizes of the databases, this of course cannot be adopted as general guidance for productive environments. Here we know that the throughput is varying heavily from the KPI’s for larger systems. The DB startup time is depending on the data-volume read e.g.:
1 TB DB with 400 MB/s read startup time +/- 20 Min
2 TB DB with 400 MB/s read startup time +/- 40 Min
6 TB DB with 400 MB/s read startup time +/- 1 h 20 Min ---at least here we see a discrepancy of min. KPI
These MINIMAL requirements are meant for PRD systems. In general, it is 250 MB/s write for the log-volume and 400 MB/s for the data-volume.
It is important to understand that we need to provide more throughput for larger systems. As mentioned, … this is a starting point.
System Type 256GB + 512GB RAM |
% of KPI |
Data-Volume |
Log-Volume |
Sandbox |
25% |
100MB/s |
50MB/s |
DEV |
25% |
100MB/s |
50MB/s |
QAS |
50% |
200MB/s |
125MB/s |
PRD |
100% |
400MB/s |
250MB/s |
Table 2 – throughput per volume
System Type 1024GB RAM |
% of KPI |
Data-Volume |
Log-Volume |
Sandbox |
25% |
100MB/s |
50MB/s |
DEV |
25% |
150MB/s |
75MB/s |
QAS |
50% |
250MB/s |
125MB/s |
PRD |
100% |
500MB/s |
250MB/s |
Table 3 – throughput per volume
Startup time DB size 0.5TB = +/- 15Min
System Type 2048GB RAM |
% of KPI |
Data-Volume |
Log-Volume |
Sandbox |
25% |
150MB/s |
100MB/s |
DEV |
25% |
150MB/s |
100MB/s |
QAS |
50% |
300MB/s |
150MB/s |
PRD |
100% |
600MB/s |
300MB/s |
Table 4 – throughput per volume
Startup time DB size 1.2TB = +/- 30 Min
System Type 4TB + 6TB RAM |
% of KPI |
Data-Volume |
Log-Volume |
Sandbox |
25% |
200MB/s |
100MB/s |
DEV |
25% |
200MB/s |
100MB/s |
QAS |
50% |
400MB/s |
200MB/s |
PRD |
100% |
800MB/s |
400MB/s |
Table 5 – throughput per volume
Startup time DB size 3TB = +/- 60Min
So how can a ANF storage design for 10 HANA databases looks like?
Let’s assume we have 4x 256GB (M32ls) DEV, 4x 1TB (M64s) QAS and 2x 1 TB (M64s) PRD systems
System type |
Storage Requirements (table 1) |
Performance Requirements (from table 2-5) |
DEV |
4x 1800GB = 7TB |
Data=4x 100MB/s ; Log=4x 50MB/s |
QAS |
4x 4500GB = 18TB |
Data=4x 250MB/s ; Log=4x 150MB/s |
PRD |
2x 4500GB = 9TB |
Data=2x 500MB/s ; Log=2x 300MB/s |
Backup |
|
DEV=100MB/s, QAS=200MB/s,PRD=500MB/s |
Shared |
|
DEV=50MB/s, QAS=50MB/s, PRD=100MB/s |
Total Storage= 34TB (from Table 1)
Total MB/s=(data and log) 3.800MB/s + (Backup) 800MB/s estimate + (Shared) 250MB/s estimated= 4800MB/s
Translated to an ANF Capacity Pool
https://azure.microsoft.com/us-en/pricing/calculator/?service=netapp
Mix of Premium and Ultra 35TB x 64MB/s = 2240MB/s + 20TB x 128MB/s =2560 à 4800MB/s
Only Ultra (34TB) 38TB x 128MB/s = 4864MB/s
Conclusion:
In this case it is much more efficient to choose Ultra over Premium. Only very little overprovisioning and very easy deployment because everything is in one pool. The volumes surely will be distributed over several controller.
One additional option is to share log-Volumes. Log volumes are not a point of interest for the backup scenarios since there are open files (log files from the database) which cannot be backed up properly. Not all databases are creating the same amount of log information nor requiring the same throughput on the log-volume. Thus it can be very beneficial to share the log-volume among some database systems to benefit from much higher performance for the group of databases which are writing into this shared log-volume.
The main reason of doing consolidation is to achieve more with less. Less administration, less resources but more performance and easier administration. Since the performance of ANF is related to the size of the volume it is essential to create a large volume to benefit from this performance quota. Create a meaningful directory structure in this volume to have a good overview of the installed SAP systems. From the application node view this structure is basically invisible.
Before you can create volumes in ANF for your SAP environment you need to create a NetApp account then a capacity pool and last the volume in the capacity pool.
The basic idea behind this is to gain performance and lower the administration overhead of single volumes or default Azure storage.
Customers tend to separate non-prod and prod environments, this surely can be done here as well.
But instead of managing, sometimes tens of volumes, you only need to manage two or maybe three volumes.
This example shows only two SAP systems but surly this can be applied to a very large scale.
Create a single volume, here for example non-prod and create for every SAP simple directories in the main volume. For the SAP those ´nested´ mount points are completely invisible.
Data volumes: If you plan to use SnapShot based backups and cloning, shared data volumes are not a good idea. If you use shared volumes a SnapRevert is not supported anymore because you would also overwrite the data of the other, shared, SAP instances. For all other volumes volume consolidation is always a good idea.
If it is required to restore a single file form one instance there is always the chance to go into the (hidden) snapshot directory and copy a single file out of this snapshot into its original location.
A shared landscape can look like this:
Here you see two DEV, two QAS and two PRD systems in an optimized volume deployment.
Another interesting idea is to consolidate volumes by SID. In this case you benefit from the fact that a snapshot would take all three areas (data, log and shared) together. And also, the performance is shared among the three areas. There is some more work to do here before you can clone/refresh the HANA database in this approach.