Well-architected framework performance in the Azure SQL Managed Instance
Published Jul 27 2023 08:05 AM 8,127 Views
Microsoft

NikoNeugebauer_0-1690469810145.pngPerformance pillar in the   Well-Architected Framework for Azure  is focused on ensuring that your workload can meet performance requirements and that it can efficiently use resources to deliver results.

 

There are several best practices within this pillar that can help you achieve these goals, including:

  1. Choosing the right compute options and SKU based on your workload's needs can help ensure optimal performance and resource utilization.
  2. Establishing performance baselines: This involves understanding the performance needs of your workload and defining performance metrics to monitor and measure against.
  3. Optimizing storage performance: This involves identifying the right type of storage for your workload and configuring it to meet performance requirements.
  4. Performance Monitoring: This includes setting up monitoring and logging systems to identify and resolve performance issues in real-time.
  5. Performance Tuning: This involves continuously monitoring and analyzing performance data to identify opportunities for optimization and implementing changes to improve workload performance.

In this blog post, let us see how we can implement some of these best practices for Azure SQL Managed Instance.

 

Establishing Performance baselines

baseline is a known value against which later measurements and performance can be compared. Baseline helps us define what is a normal database performance and thus comparing against the baseline provides us with insights into any abnormalities. Ideally, one should take performance measurements at regular intervals over time, even when no problems occur, to establish a server performance baseline. Compare each new set of measurements with those taken earlier.

 

Baselines are very critical whenever you make any changes to your database application. Common scenarios include –

  • Migrating to a new platform such as Managed Instance from on premises
  • Changing the service tier
  • Changing an instance setting.

It helps compare the performance of your workload after the change with your original workload.

 

If we take the example of a migration from an on prem SQL Server to a managed instance, we can use the following methodology for doing a performance comparison.

 

  • Compare the environment settings on SQL Server and Managed Instance such as cores, memory, compatibility levels, etc. 
  • Create performance baseline on source SQL Server with usage, Disk IO metrics, memory utilization and performance of critical queries.
  • Compare performance on Managed Instance with the baseline

 

Compare Environment settings

 

The first thing that you need to do is to ensure that the characteristics of your source SQL Server are comparable with the destination Managed Instance. Managed Instance is the latest version of SQL Server Database Engine hosted in a special environment in the Azure cloud with built-in high-availability capabilities. This means that you might experiences differences in the workload performance compared to your on-premises version of SQL Server. Learn about the key environment differences that might cause the performance differences between Managed Instance and SQL Server and identify the risks that might affect the performance.

 

To minimize the number of factors that could affect performance of your workload, you should try to align the settings on your Managed Instance with the original settings on the source SQL Server.

 

 In the article -Compare environment settings on SQL Server and Azure SQL that may impact performance you can find the scripts  to compare environment settings. Scripts used in this article are found on SQL Server GitHub repository:

  1. Get-properties.sql that reads various instance, database, tempdb, trace flag values and produces a XML document with all properties. You should execute this script once on the source database and then on the destination database.
  2. Compare-properties.sql compares the values in the XML generated by the previous script and shows the differences.

Try to resolve the differences or understand how the differences would affect performance.

 

Create performance baseline

 

Select a set of queries that are important to, and representative of your business workload. Measure and document the min/average/max duration and CPU usage for these queries, as well as performance metrics on the source server, such as average/max CPU usage, average/max disk IO latency, throughput, IOPS, average / max page life expectancy, and average max size of tempdb.

The following resources can help define a performance baseline:

  • Monitor CPU usage
  • Monitor memory usage and determine the amount of memory used by different components such as buffer pool, plan cache, column-store pool, In-Memory OLTP, etc. In addition, you should find average and peak values of the Page Life Expectancy memory performance counter.
  • Monitor disk IO usage on the source SQL Server instance using the sys.dm_io_virtual_file_stats view or performance counters.
  • Monitor workload and query performance by examining Dynamic Management Views (or Query Store if you are migrating from SQL Server 2014 and later). Identify average duration and CPU usage of the most important queries in your workload.

Compare Performance

 

After you have defined a baseline, compare similar workload performance on the target SQL Managed Instance. For accuracy, it is important that the SQL Managed Instance environment is comparable to the SQL Server environment as much as possible.

There are SQL Managed Instance infrastructure differences that make matching performance exactly unlikely. Some queries may run faster than expected, while others may be slower. The goal of this comparison is to verify that workload performance in the managed instance matches the performance on SQL Server (on average) and to identify any critical queries with performance that don't match your original performance.

 

Performance comparison is likely to result in the following outcomes:

  • Workload performance on the managed instance is aligned or better than the workload performance on your source SQL Server. In this case, you have successfully confirmed that migration is successful.
  • Many performance parameters and queries in the workload performed as expected, with some exceptions resulting in degraded performance. In this case, identify the differences and their importance. If there are some important queries with degraded performance, investigate whether the underlying SQL plans have changed or whether queries are hitting resource limits. You can mitigate this by applying some hints on critical queries (for example, change compatibility level, legacy cardinality estimator) either directly or using plan guides. Ensure statistics and indexes are up to date and equivalent in both environments.
  • Most queries are slower on a managed instance compared to your source SQL Server instance. In this case, try to identify the root causes of the difference such as reaching some resource limit such as IO, memory, or instance log rate limits. If there are no resource limits causing the difference, try changing the compatibility level of the database or change database settings like legacy cardinality estimation and rerun the test. Review the recommendations provided by the managed instance or Query Store views to identify the queries with regressed performance.

 

SQL Managed Instance has a built-in automatic plan correction feature that is enabled by default. This feature ensures that queries that worked fine in the past do not degrade in the future. If this feature is not enabled, run the workload so SQL Managed Instance can learn the performance baseline. Then, enable the feature and run the workload again with the new settings.

Automatic plan correction identifies execution plan choice regression and automatically fixes the issue by forcing the last known good plan.  

 

Make changes in the parameters of your test or upgrade to higher service tiers to reach the optimal configuration for the workload performance that fits your needs.

If you're experiencing performance differences between Azure SQL Managed Instance and SQL server, see these  tips and diagnostic checks that you can perform to identify the root causes of the performance issue....

 

Azure SQL Managed Instance is based on a virtual core (vCore) model, where a vCore represents a logical CPU, powered by Intel® Xeon® Scalable processors. Your costs depend on the choice and usage of the service tier you require, your hardware configuration, compute resources (the number of vCores and the amount of memory), reserved database storage, and backup storage. You control how you choose the hardware configuration so you can match compute and memory to your workload requirements. Hardware configuration options include standard-series (Gen5), premium-series, and premium-series memory-optimized. Hardware configuration defines the compute and memory limits and other characteristics that impact workload performance.

 

For more information on hardware configuration specifics and limitations – including which Intel® Xeon® Scalable processor is included in each series, see hardware configuration characteristics.

 

Selecting the right managed instance target and SKU for your database can be complicated. To make this process easier, we have the following two tools:

Both these tools collect performance data from your source SQL Server instances hosting your databases and recommend minimum Azure SQL Database, Azure SQL Managed Instance, or SQL Server on Azure VM SKU based on the data collected. The feature provides recommendations related to pricing tier, compute level, and data size.

 

Optimizing storage performance

Managed Instance General Purpose uses Azure Premium Storage to store database files for all databases, except for the tempdb database. From the perspective of the database engine, this storage type is remote, i.e. it is accessed over the network, using Azure network infrastructure. To use Azure Premium Storage, SQL MI General Purpose service tier takes advantage of SQL Server native capability to use database files directly in Azure Blob Storage . This means that there is not a disk or a network share that hosts database files; instead, file path is an HTTPS URL, and each database file is a page blob in Azure Blob Storage.

 

For the General Purpose service tier the size of a database file determines the maximum IOPS and throughput that is achievable against the file, as shown in the following table

 

NikoNeugebauer_0-1691051311770.png

Reference: Resource limits - Azure SQL Managed Instance | Microsoft Learn

    

When the IOPS or throughput generated by the database workload exceeds the limits of a database file/blob, storage throttling takes place. For MI GP instances, a typical symptom of storage throttling is high IO latency.

 

If you observe the symptoms of storage throttling when running workloads on MI GP(General Purpose), we recommend increasing database file size to get more IOPS/throughput from Azure Premium Storage.

 

Managed Instance instances do not use remote Azure Premium Storage, but instead use local SSD storage. Storage performance considerations discussed above do not apply to BC instances.

 

 

Storage performance best practices for MI GP

 

  1. Determine if the storage IO generated by the database workload exceeds Azure Premium Storage limits. A script that examines IOPS/throughput per database file over a period of time, and compares them against the limits, is available on GitHub. If storage IO is near the limits, allocate more IOPS/throughput for the database as described below.

    2. When possible, increase space allocation at the MI instance level. This will allow you to increase the size of individual database files, resulting in higher IOPS/throughput limits per file.

    3. Increase the size of individual database files as much as possible to utilize available instance space, even if a large portion of each file would remain empty. But consider Azure Premium Storages well. For example, if the size of a file is 150 GB, then increasing it up to 256 GB would not provide any additional IOPS/throughput. This larger file would take some of the instance storage space that may be better used to increase IOPS/throughput for other database files on the same instance. In this example, to get the advantage of the next storage performance tier for this file, it would be necessary and sufficient to increase file size to 257 GB. MI GP would then snap up the blob size for this file to 512 GB (P20).

    4. For many write-intensive workloads, increasing the size of the transaction log file provides higher performance improvement than increasing the size of data files.

    5. Make all data files equally sized with the same growth increment. While this recommendation is not specific to MI, it is particularly relevant here. On MI GP, a single small file may negatively impact performance of the entire workload, due to storage throttling becoming more likely for that small file.

    6. Avoid using many small files for a database. While the aggregate IOPS/throughput from many small files may be comparable to the IOPS/throughput provided by fewer large files, each individual small file is more likely to encounter storage throttling, which affects the entire workload.

    7. Consider not only increasing file size, but also adding data files. For example, consider an MI instance with a single database. If this database has a single data file, it can get at most 7500 IOPS for data IO, by making that file larger than 1 TB in size (a P40 or P50 blob would be used depending on the exact file size). But using N files, each larger than 1 TB, would provide up to 7500*N IOPS for data IO, and 7500 IOPS for log IO. For example, you could use six 1.1 TB data files, and one 1.1 TB log file. The total file size would be 7.7 TB, while the total blob size would be 14 TB, remaining within the 16 TB and 35 TB limits respectively. Keep in mind that simply adding empty files to an existing database would not improve performance for most workloads; it is also necessary to reallocate existing data over all files

 

If sufficient storage performance cannot be achieved on a General-Purpose instance even after following the best practices described above, then switching to a Business-Critical instance is recommended.

 

Reference: Storage performance best practices and considerations for Azure SQL DB Managed Instance (General Pur...

 

Monitoring Performance

 

For production workloads running on your Azure SQL Managed Instance, it is very important to continuously understand resource consumption and any potential performance issues such that you can react in time and have your business running smoothly. There exist several options for monitoring and alerting of managed instances that can be considered, be it Microsoft cloud native, on-premises products, custom solutions, or third-party solutions.

 

In understanding and evaluating these options there are several things you need to take into consideration depending on our workload and how critical to you are the apps you are running with managed instance. Some of the considerations you need to evaluate while deciding on a monitoring and alerting system for your managed instance are:

 

  • Telemetry latency - how long it takes for the telemetry to reach you from the database until an alert or a notification is triggered (e.g. near real-time, each minute, or every several minutes).
  • Sampling rate - how frequently is telemetry being sample from the database (e.g. every second, every minute, or every several minutes)
  • Alerting capability - what are all the ways you can be alerted and if there are integrations point available (e.g. email, voice call, SMS, web hooks, integration with a 3rd party ticketing system, etc.)
  • Resource cost - what is the impact of telemetry pooling to the performance of your database
  • Skill set required - how easy or complex is it to setup a solution. (e.g. could a novice user set it up, or do you need DBA \ dev. skills)
  • Cost - important as well is the financial cost for monitoring and alerting in terms of licensing, subscriptions, cost per alert, that is what is the price vs. performance ratio you need for your business continuity

 

The options, available today to monitor Azure SQL Managed Instance resources and its performance taking the above stated objectives into consideration are:

For additional details, please refer to the below link –  http://aka.ms/mi-monitoring

 

Improving performance over time

 

This involves continuously monitoring and analyzing performance data to identify opportunities for optimization and implementing changes to improve workload performance.

 

In traditional on-premises SQL Server, the process of initial capacity planning often is separated from the process of running an application in production. Hardware and product licenses are purchased first, and performance tuning is done afterward. When you use Azure SQL MI, it's a good idea to interweave the process of running an application and tuning it. With the model of paying for capacity on demand, you can tune your application to use the minimum resources needed now, instead of over-provisioning on hardware based on guesses of future growth plans for an application, which often are incorrect. 

 

The following checklist can help in optimizing the managed instance and improving the workload performance.

 

 

By following these best practices, you can ensure that your Azure SQL MI workload is optimized for performance and can efficiently use resources to deliver results.

Version history
Last update:
‎Aug 03 2023 01:29 AM
Updated by: