User Profile
RalfKlahr
Joined 7 years ago
User Widgets
Recent Discussions
SAP Landscape sizing and volume consolidation with ANF
SAP Landscape sizing and volume consolidation with ANF This Blog does not have the claim to be all-embracing and should not be seen as single source of truth. I only would like to open a much broader sizing discussion and present a different view on this topic. The second part will try to explain how volume consolidation works. Sizing an SAP Landscape is usually a very difficult task because there are so many different, sometimes unknown, parameters and values to be taken into consideration. Most of the sizing tools only look towards a single system. This is surely okay for the VM (CPU) sizing however, when it comes to an optimized storage design most tools are not seeing the to avoid complete SAP Landscape and this obviously are not optimized for the best TCO for the customer. Even when a storage design looks more expensive in the first view, it can be the basis of a much better TCO when all IT costs are taken into consideration. Especially storage changes and optimalizations are usually very complex tasks which sometimes even require longer system downtimes. To avoid unnecessary outages the SAP landscape, need to have a very flexible storage environment which allows the customer to grow and react on changes or different requirements from the application very quickly. All this together guarantees optimized TCO and a smooth and reliable SAP landscape for our customers. Most of the effort and cost is going in the landscape management and administration. Only 25% of the overall costs are going into the Infrastructure investment. Source: https://flylib.com/books/en/4.91.1.14/1/ No support of NVA in the Data Path!!! Because of performance and latency reasons it is not supported to configure an Network Virtual Appliance in the data path of the SAP App server to the DB nor from the DB server to the ANF. This is also stated in SAP note: https://launchpad.support.sap.com/#/notes/2731110 Performance tuning To optimize an SAP Landscape, it is essential to monitor the used capacity (CPU, Network and Storage) continuously and evaluate the business needs with this outcome to be able to align and optimize quickly to meet the business requirements. The IT must catch up with the business not the other way around. It is a continuous process … Monitor ->Evaluate ->Adjust -> Monitor….. Storage landscape sizing based on Azure NetApp Files (ANF) Before this new feature was introduced there was a fixed performance ratio for the volume, depending on the Capacity pool QoS we got 16, 64 or 128 MB/s per Terabyte volume size. After implementing the new “Manual QoS Capacity Pool” (public pre-view) feature for ANF the storage sizing is much more optimized than with the previous fixed ratio between performance and volume size. This new feature allows to optimize the volume throughput per volume. Now, also small volumes can benefit from a higher throughput which helps to optimize the overall design. The challenge now is to find a good mix of “slow” and “fast” volumes in the Manual QoS Capacity Pool. This challenge gets much easier by larger Capacity Pools. I will give some sizing examples where I can demonstrate how easy it is when we focus on a landscape - and not focus on a single system design. ANF Storage QoS classes In ANF we have three different storage classes available. Note that this performance OoS pool setting is only there to manage the capacity of the storage system. The data will always be written to the same ANF backend. We differentiate between: Capacity QoS Performance per terabyte volume size Standard 16MB/s Premium 64MB/s Ultra 128MB/s Of course, different costs are associated to the different QoS classes. The cost calculator is available under: https://azure.microsoft.com/en-us/pricing/details/netapp/ Storage sizing … different approach To find the optimal Capacity Pool size we need to calculate the individual storage requirements. And then integrate this individual number into the Capacity Pool calculation. This will present then the “Big Picture”. The big benefit of ANF here is that nothing is written in stone, size and performance can be adopted dynamically during normal operation. Changing size or performance quota does not need any downtime for the customer. System RAM vers. HANA DB size….It is essential to understand that the system RAM cannot be taken as the DB size of HANA. There is space in RAM required for OS operations, HANA delta tables, HANA temporary tables and so on. So the actual DB in size in memory if about 50% of the RAM. This is also the “golden sizing rule” for SAP HANA. Source: SAP TDIv5 Memory guide To calculate the required storage for the system this sizing table should help to calculate the overall storage requirements. Important is not the single data-, log, shared or backup volume, only the total value is important here. For larger VM’s (4TB) those Values are very rough estimations. The backup concept has massive impact on the size of the backup volume. Shared Volume: Shared between group of systems, like DEV, QAS and PRD. Shared Volume also contains /usr/sap. Backup Volume: Shared between all instances. (1x DB Data+ 2x Log Volume + X) SnapShot reserve is already calculated in the data-volume – RAM size not equal DB size VM Main Memory Data Vol (GB) Log Vol (GB) Shared (GB) Backup (GB) Total (GB) 256GB 300 300 300 900 1800 512GB 500 300 500 1100 2400 1024GB 1000 500 1000 2000 4500 2048GB 2000 500 2000 3000 7500 4096GB 4000 500 4000 5000 13500 6192GB 6000 500 6000 7000 19500 Table 1 – overall storage requirement Volume Performance and sizing As a basis for a start design, we estimate a performance quota for DEV, Sandbox, QAS and PRD systems (we stick to the SAP HANA storage KPI’s only for PRD systems). If customers also using their QAS systems for performance testing with the same dataset, it would make sense to design also the QAS storage performance accordingly. Surly this can/must be adopted dynamically if required differently from the customer. SAP is providing KPI’s only for data- and log-volumes. Those KPI’s are equal over all sizes of the databases, this of course cannot be adopted as general guidance for productive environments. Here we know that the throughput is varying heavily from the KPI’s for larger systems. The DB startup time is depending on the data-volume read e.g.: 1 TB DB with 400 MB/s read startup time +/- 20 Min 2 TB DB with 400 MB/s read startup time +/- 40 Min 6 TB DB with 400 MB/s read startup time +/- 1 h 20 Min ---at least here we see a discrepancy of min. KPI These MINIMAL requirements are meant for PRD systems. In general, it is 250 MB/s write for the log-volume and 400 MB/s for the data-volume. It is important to understand that we need to provide more throughput for larger systems. As mentioned, … this is a starting point. System Type 256GB + 512GB RAM % of KPI Data-Volume Log-Volume Sandbox 25% 100MB/s 50MB/s DEV 25% 100MB/s 50MB/s QAS 50% 200MB/s 125MB/s PRD 100% 400MB/s 250MB/s Table 2 – throughput per volume System Type 1024GB RAM % of KPI Data-Volume Log-Volume Sandbox 25% 100MB/s 50MB/s DEV 25% 150MB/s 75MB/s QAS 50% 250MB/s 125MB/s PRD 100% 500MB/s 250MB/s Table 3 – throughput per volume Startup time DB size 0.5TB = +/- 15Min System Type 2048GB RAM % of KPI Data-Volume Log-Volume Sandbox 25% 150MB/s 100MB/s DEV 25% 150MB/s 100MB/s QAS 50% 300MB/s 150MB/s PRD 100% 600MB/s 300MB/s Table 4 – throughput per volume Startup time DB size 1.2TB = +/- 30 Min System Type 4TB + 6TB RAM % of KPI Data-Volume Log-Volume Sandbox 25% 200MB/s 100MB/s DEV 25% 200MB/s 100MB/s QAS 50% 400MB/s 200MB/s PRD 100% 800MB/s 400MB/s Table 5 – throughput per volume Startup time DB size 3TB = +/- 60Min Sizing a Landscape with 10 Systems So how can a ANF storage design for 10 HANA databases looks like? Let’s assume we have 4x 256GB (M32ls) DEV, 4x 1TB (M64s) QAS and 2x 1 TB (M64s) PRD systems System type Storage Requirements (table 1) Performance Requirements (from table 2-5) DEV 4x 1800GB = 7TB Data=4x 100MB/s ; Log=4x 50MB/s QAS 4x 4500GB = 18TB Data=4x 250MB/s ; Log=4x 150MB/s PRD 2x 4500GB = 9TB Data=2x 500MB/s ; Log=2x 300MB/s Backup DEV=100MB/s, QAS=200MB/s,PRD=500MB/s Shared DEV=50MB/s, QAS=50MB/s, PRD=100MB/s Total Storage= 34TB (from Table 1) Total MB/s=(data and log) 3.800MB/s + (Backup) 800MB/s estimate + (Shared) 250MB/s estimated= 4800MB/s Translated to an ANF Capacity Pool Premium 35x 64MB/s = 2240MB/s (we make Backup a bit smaller that it fits into the calculation. Can grow on demand - but it will take some time) Ultra 20x 128MB/s = 2560MB/s Total 4800MB/s Cost estimation https://azure.microsoft.com/us-en/pricing/calculator/?service=netapp Mix of Premium and Ultra 35TB x 64MB/s = 2240MB/s + 20TB x 128MB/s =2560 à 4800MB/s Only Ultra (34TB) 38TB x 128MB/s = 4864MB/s Conclusion: In this case it is much more efficient to choose Ultra over Premium. Only very little overprovisioning and very easy deployment because everything is in one pool. The volumes surely will be distributed over several controller. . Consolidation of ANF Volumes Consolidating HANA Log-Volumes One additional option is to share log-Volumes. Log volumes are not a point of interest for the backup scenarios since there are open files (log files from the database) which cannot be backed up properly. Not all databases are creating the same amount of log information nor requiring the same throughput on the log-volume. Thus it can be very beneficial to share the log-volume among some database systems to benefit from much higher performance for the group of databases which are writing into this shared log-volume. 1 Prod System => 250 MB/s 2 Prod Systems=> +125 MB/s (+50%) = 375 MB/s 3 Prod Systems => +125 MB/s = 500 MB/s 4 Prod Systems => +125 MB/s = 625 MB/s 5 Prod Systems => +125 MB/s = 750 MB/s How to create a consolidated structure with ANF The main reason of doing consolidation is to achieve more with less. Less administration, less resources but more performance and easier administration. Since the performance of ANF is related to the size of the volume it is essential to create a large volume to benefit from this performance quota. Create a meaningful directory structure in this volume to have a good overview of the installed SAP systems. From the application node view this structure is basically invisible. Understanding how ANF is structured and how it needs to be configured Before you can create volumes in ANF for your SAP environment you need to create a NetApp account then a capacity pool and last the volume in the capacity pool. More Info:https://docs.microsoft.com/en-gb/azure/azure-netapp-files/azure-netapp-files-understand-storage-hierarchy#capacity_pools Design concept of multiple SAP systems in one ANF Volume The basic idea behind this is to gain performance and lower the administration overhead of single volumes or default Azure storage. Customers tend to separate non-prod and prod environments, this surely can be done here as well. But instead of managing, sometimes tens of volumes, you only need to manage two or maybe three volumes. This example shows only two SAP systems but surly this can be applied to a very large scale. Create a single volume, here for example non-prod and create for every SAP simple directories in the main volume. For the SAP those ´nested´ mount points are completely invisible. Deployment of ANF with multiple instances Data volumes: If you plan to use SnapShot based backups and cloning, shared data volumes are not a good idea. If you use shared volumes a SnapRevert is not supported anymore because you would also overwrite the data of the other, shared, SAP instances. For all other volumes volume consolidation is always a good idea. If it is required to restore a single file form one instance there is always the chance to go into the (hidden) snapshot directory and copy a single file out of this snapshot into its original location. A shared landscape can look like this: Here you see two DEV, two QAS and two PRD systems in an optimized volume deployment. Another interesting idea is to consolidate volumes by SID. In this case you benefit from the fact that a snapshot would take all three areas (data, log and shared) together. And also, the performance is shared among the three areas. There is some more work to do here before you can clone/refresh the HANA database in this approach.7.1KViews3likes0Comments