With the introduction of Azure SQL Database Managed Instance (SQL MI), as presented here, more and more customers using SQL Server on premise looking in SQL MI as a great way to migrate to Azure. Having an almost 100% compatibility surface, security and isolation options as a fully managed PaaS with attractive licensing options enables many customers to take the leap to Azure SQL with high confidence. As SQL MI also offers a scale up instance up to 80 cores, it enables consolidation scenarios that can offer great ROI benefits.
But, as any other SQL server consolidation effort, attention is required into planning resources and the right combination of workloads and databases to ensure all work right together. You will find several suggestions below coming from similar deployments and projects that may benefit you as well.
Findings and best practices
- Choosing the right SQL MI tier. Business Critical tier should be the main choice as it offers the best latency for IOs, local readable replica plus the option to scale up to 80 cores easily.
- Consider the defined SQL MI resource limits described here, especially the 100 databases max per instance, unless the instance storage size limit has been reached.
- geoDR considerations
- Ensure you group databases in your consolidated instance that are somehow related and they can failover together when a geoDR failover is triggered as currently geoDR is done at a SQL MI instance and not at a database level.
- By default, the geoDR instances use the same number of cores as the primary database. This is to ensure equal capacity in case of a failover plus maintain low latency when applying data changes from the primary.
- Now, we have seen customer cases where they try to manage overall cost by lowering the number of cores in geoDR instance. For example, they may have 64 cores in the primary db and 32 cores on the geoDR instance as they both support up to 4TB storage.
- But, you should monitor the log applying rate to ensure it does not fall far behind. If geoDR instances lag is too high, we will throttle the primary database log rate to ensure geoDR catches up, as said here. That may affect the primary DB performance. If that happens often, you may need to increase the number of cores in geoDR instances.
- You can find details here on how to manage better the geoDR secondary monitoring and setup.
- An additional aspect to consider in sizing the geoDR instances is if you use them for reading and reporting purposes. That will add load on the instance resources and it should be considered in the sizing planning.
- Plan for the log rate throughput levels
- In SQL MI, log rate (MB/s) grows up to 48MB/s, available at 16 cores instances. Adding more cores does not increase the log rate beyond 48MB/s.
- If your applications write in the log extensively or you expect low latency in the log writes, you should consider that in your consolidation planning as the 48MB/s current limit is per instance plus it includes the needs for the tempDB log.
- Manage the MAXDOP value. As you migrate to a higher core instance than before, you may need to limit the MAXDOP per database, at about 8 or less. MAXDOP can be set at instance or database or even query level. if MAXDOP is 0 at the database level, it inherits the value from the instance level. This is more critical for customers running traditional row mode parallelism in queries as that execution model is less flexible than batch mode parallelism. If you have an application being migrated to MI, then using compatibility level 150 may improve performance because it allows batch mode to run over b-tree index storage models. You still will likely want a MAXDOP setting, but this is another option to consider to get the best performance on larger systems as batch mode scales to higher core counts vs. row mode parallelism.
- When scaling up you SQL MI instance, the scaling operation introduces short downtime at the end of the operation, typically up to 10 seconds. Duration of a failover can vary significantly in case of long-running transactions that happen on the databases due to prolonged recovery time. Hence, it’s not recommended to scale compute or storage of Azure SQL Database managed instance or to change service tier at the same time with the long-running transactions (data import, data processing jobs, index rebuild, etc.). Database failover that will be performed at the end of the operation will cancel ongoing transactions and result in prolonged recovery time.
As such, consider also enabling accelerated database recovery in SQL MI, as discussed here.
- Consider running database auto statistics in async mode to minimize the impact and potential blocking with other long running processes. This can help especially if your SQL Server frequently executes the same query or similar cached query plans.
- Note: to use AUTO_UPDATE_STATISTICS_ASYNC, the AUTO_UPDATE_STATISTICS needs to be enabled as well.