Blog Post

Azure SQL Blog
5 MIN READ

Backup restore speed improvement for SQL Managed Instance

NikoNeugebauer's avatar
Aug 24, 2022

The importance of fast database restore operations in any environment cannot be overstated – it is essentially time business cannot carry on; everything is offline.

 

There is a known DBA saying that “one should not have a backup strategy, but a recovery strategy”, which means that just taking backups isn’t enough, and that the restore process should also be regularly tested, so that you measure and know how much time it takes to restore a production database if and when the need arises.

 

Azure SQL Managed Instance allows customers not only to recover their databases from when FULL or DIFFERENTIAL backups were taken, but it also allows customers to recover their database to any given point in time by offering PITR – Point-In-Time Restore functionality. Besides being able to restore data from recent backups, Azure SQL Managed Instance also supports recovering databases from discrete, individual backups from long term retention (LTR) storage – which can be configured up to 10 years.

 

Once the restore process has been initiated, there is nothing that a customer can do to help it – as the process executes asynchronously and cannot be cancelled. While customers can scale up Azure SQL Managed Instance before a restore operation to increase backup restore speed, it’s only possible to do so before a planned restore, and not when there is a sudden and unexpected need.  Otherwise, no matter how fast the scaling operation executes, it would still further delay the overall speed of the database restoration process.

Improving restore speed


There are multiple phases of the SQL MI database restore process, which, aside from the usual SQL Server restore operations, includes registering databases as an Azure asset for visibility & management within the Azure Portal. Additionally, the restore process forces a number of specific internal options, and some property changes such as forcing the switch to the FULL recovery model and forcing the database option PAGE_VERIFY to CHECKSUM, as well as eventually performing a full backup to start the log chain and provide full point-in-time- restore options through the combination of full and log database backups.

 

The restore operation on SQL MI also includes log truncation, and the execution time for the truncation has been vastly improved, which means that customers can expect their entire database restore process to become faster on both service tiers.

Service tiers differences


It is important to understand that faster means different speeds on different service tiers. This has to do with an additional necessary operation for the Business Critical service tier. As we have described the internal architecture in High Availability in Azure SQL MI: Business Critical service tier, the Business Critical Service tier runs on a synchronous Availability Group with 4 replicas, meaning the initial backup must be replicated to all the replicas in order to complete the setup. The current implementation of the restore process on the Business Critical service tier uses direct seeding of the Availability Group to distribute the newly restored database between replicas. As such, the restore operation will not complete until the backup has been restored to every replica, and a full backup can’t be taken until direct seeding to the secondary replicas finishes.

 

The following diagram shows a conceptual explanation of some of the most time-consuming execution phases of the database restore process on Azure SQL Managed Instance.

 

Although the boxes are similar sizes, they don’t represent the real amount of time spent on each of those specific functions. The real amount of time vastly differs based on the number of transaction log backups, the ratio of data compression, and, of course, the sheer speed of write and log operations that are available to the Azure SQL Managed Instance. Another important factor to consider are the operations executed against the Azure SQL Managed Instance during the database restore process – either directly by the user, or automatically by the engine – such as Tuple Mover in Columnstore Indexes, or even automated operations configured by the customer, such as index maintenance jobs.

 

Test setup & obtained results


For the performance testing, we created two identical Azure SQL Managed Instances - one with the log truncation improvement included and activated, while the second one does not use the improvement.  The two instances use the following hardware specifications:

  • Service tier: General Purpose
  • Hardware generation: Standard Series (Gen 5)
  • CPU vCores: 8
  • Storage: 4 TB
  • Region: East US 2

We have tested three databases, each one with a different size and a different fullness percentage, representing ranges from 60 GB up to 800 GB.

  • db1: size of 800GB
  • db2: size of 600GB
  • db3: size of 60GB

Here are the results, in seconds, for multiple restore tests for each of the databases presented both for the default Azure SQL Managed Instance and for the Azure SQL Managed Instance that has the feature enabled.

 

60 GB

600 GB

800 GB

Without Feature

266 s

1718 s

5048 s

With Feature enabled

155 s

704 s

2054 s

We went a step further, experimenting with the 800 GB database and tested different situations that included not just a fresh FULL backup but also such configurations as FULL backup with a number of Transaction Log backups, FULL and DIFFERENTIAL and Transaction Log backups. The observed results can be seen in the table below:

 

Full only

Full + logs

Full + Diff + Logs

Without Feature

84 min

106 min

120 min

With Feature enabled

34 min

46 min

90 min

 

Here are those results visualized for better understanding of the impact:

 

Benefit estimates

 

Exact benefits are very specific to the configuration and workload of each Azure SQL Managed Instance. On the General Purpose service tier, the exact size of database data and log files directly impacts the benefit, as it does with any kind of workload. If you are interested in more details, please consult our respective documentation article How-to improve data loading performance on SQL Managed Instance with General Purpose service tier. Additionally, customers should be mindful of the exact geographical region of their backup file locations, and the number of files, as a high number of transaction log files significantly impact restore times, as well as the number of differential backups, and their size.

 

For the databases smaller than 1 TB in the General Purpose service tier, we expect database restoration speed to increase from 20% to 70%. For databases that are greater than 1 TB, expect restoration speed improvements between 30% to 70%.

However, overall, customers on both service tiers should see a performance improvement to their database backup restoration process – but the improvement will vary for each case.



So, what is the catch, you might ask…?
How can you get an Azure SQL Managed Instance with this feature?
How can you enable faster restores?

The best news is that this feature is coming right now to all of our customers, and they won’t have to enable it, their restore operations will simply start working faster.

Updated Aug 24, 2022
Version 1.0
No CommentsBe the first to comment