Azure SQL Blog

8 MIN READ

How-to improve data loading performance on SQL Managed Instance with General Purpose service tier

Microsoft

Jan 18, 2022

In this blog post we shall consider some of the strategies for improving data loading performance in Azure SQL Managed Instance. These strategies apply to the repeatable ETL processes, meaning that if there a problem, data loading process can be repeated without loss of any bit of data or its consistency. Of course, these strategies do require a prepared design for the possibility of the repetition, but this is a basic requirement for any data loading process.

There are many great ways of ensuring high performance data loading and this blog post does not pretend to be exhaustive an ultimate resource, but it will provide a couple of known paths that can drastically improve the performance of the Log Throughput in Azure SQL MI.

If you are interested in other posts on how-to discover different aspects of SQL MI - please visit the http://aka.ms/sqlmi-howto, which serves as a place holder for the series.

Test Setup

For the test case I have created a SQL MI with General Purpose service tier, 4 CPU vCores and 2 TB of reserved space.

For the source table let’s use a table called TestSource, where we shall load 2 million rows with the following code, that will ensure creation of a new database [sourcedb], hosting this source table [TestSource] table with a primitive test data structure:

CREATE DATABASE sourcedb;
DROP TABLE IF EXISTS dbo.TestSource;

CREATE TABLE dbo.TestSource(
	id int not null,
	name char (200),
	lastname char(200),
	logDate datetime );

-- Loading Rows 
set nocount on
declare @i as int;
set @i = 1;

BEGIN TRAN
WHILE @i <= 2000000
BEGIN
	INSERT INTO dbo.TestSource 
		VALUES (@i, 'First Name', 'LastName', GetDate());

	SET @i = @i + 1;
END;
COMMIT

By using the system stored procedure sys.sp_spaceused we can determine how much data we just have loaded into our source table:

EXECUTE sys.sp_spaceused 'dbo.TestSource';

The space occupied by the data is around 1 GB, which is small amount, but it is good enough for the purpose of fast testing.

Database Data & Log Sizes:

The throughput of the remote storage that the SQL MI with General Purpose service tier is using depends on the size of the respective data & log files (for both read and write operations, plus of course there is log throughput throttling on the write operations).

For the test we create 2 databases and increase the sizes of the data & log files to 257 GB, that will bring a significant performance improvement over the default values (given that we have not modified the model database).

use master;

CREATE DATABASE db1;
CREATE DATABASE db2;

GO
ALTER DATABASE [db2] MODIFY FILE ( NAME = N'data_0', SIZE = 263168000KB )
GO
ALTER DATABASE [db2] MODIFY FILE ( NAME = N'log', SIZE = 263168000KB )
GO

Checking on the sizes of the will show us the expected results that our database [db1] is sized under 3GB both files – mapping to the 128 GB disks (P10) internally and the 257GB database & log file sizes for the database [db2] will map to the 512 GB disks (P20), as according to the picture below:

-- Check the database file sizes
SELECT DB_NAME(database_id) as DbName, 
		type, 
		type_desc, 
		name, 
		CAST( size*8./1024/1024 as DECIMAL(9,2)) AS SizeInGB
	FROM sys.master_files
	WHERE DB_NAME(database_id) IN ('db1','db2') AND file_id < 65530;

Besides the number of IOPs, which can be less important metric for a lot of ETL processes, the potential difference in throughput between the disks can be around 50% (100 MB/s vs 150 MB/s).

To test the data loading speed, let’s create in both databases ([db1] & [db2]) the destination table which will call [DestinationTable] and then load the data into it, by executing it multiple times.

DROP TABLE IF EXISTS dbo.DestinationTable;

CREATE TABLE dbo.DestinationTable(
	id int not null,
	name char (200),
	lastname char(200),
	logDate datetime );

INSERT INTO dbo.DestinationTable 
	(id, name, lastname, logdate)
	SELECT id, name, lastname, logDate
		FROM sourcedb.dbo.TestSource;

The performance results will look very surprising – while it takes on average 100 seconds for the data to be loaded inside the database [db1], for the second database [db2] with bigger and faster disks, I was able to load the same data within just 57 seconds, bringing a very significant performance improvement of being almost 2 times faster.

But can we get the execution running a little bit faster?

Let’s look at the execution plan which is presented on the picture below and which runs with a single thread, making it execution painfully slow.

To improve the loading times, if there is still a capacity of our destination disk drives (and the SQL MI itself), we can try to load the data by using the parallel processes.

Parallel Load

To obtain the parallel execution plan for the load process, in this very case, we shall be adding the TABLOCK hint, which will allow both the SELECT and the INSERT part of the statement to execute in parallel.

The topic of getting parallelism for your execution plan is a very large and complex one, which might include different settings and/or hints and might be limited as per the features involved.

TRUNCATE TABLE dbo.DestinationTable;

INSERT INTO dbo.DestinationTable WITH (TABLOCK)
	(id, name, lastname, logdate)
	SELECT id, name, lastname, logDate
		FROM sourcedb.dbo.TestSource;

Here is the estimated execution plan that we get, when executing the load statement (INSERT INTO), showing parallelism on both SCAN and INSERT operations:

The execution times will be very significantly reduced for the database [db1] down to just 66 seconds (representing 1/3 of time improvement), while on the second database [db2] our times will go down to 49 seconds, making an improvement of 14%, while still staying incredibly faster than the first database.

Going faster with TempDB.

Warning: as noted in the beginning of this post, we are considering here only the repeatable ETL processes that can be re-executed without loss of the data multiple times, if needed.

All user databases in Azure SQL Managed Instance currently are confined to a FULL recovery model, (think recovery to a given point in time, making Availability Groups work, etc), but there is one database that can ignore this rule and stay in a SIMPLE recovery model. Even more, that database is using Delayed Durability for the transactions. This database is much loved and very much needed [tempdb]. The database content is reset with every failover between instances, and it is volatile, but if we need to land the reloadable data really fast into our SQL MI on General Purpose service tier, this can be one of the best opportunities.

There is one more thing that needs to be added to this story. A reminder, so to say.
Every single General Purpose service tier gets 24 GB of fast local SSD storage for each of the CPU vCores and in my test SQL MI case I have 96 GB of this local SSD storage, that will help my execution times to improve amazingly.

By running the code below (with and without TABLOCK hint) we can test this solution

use tempdb;

DROP TABLE IF EXISTS dbo.DestinationTable;

CREATE TABLE dbo.DestinationTable(
	id int not null,
	name char (200),
	lastname char(200),
	logDate datetime );

TRUNCATE TABLE dbo.DestinationTable;
INSERT INTO dbo.DestinationTable WITH (TABLOCK)
	(id, name, lastname, logdate)
	SELECT id, name, lastname, logDate
		FROM sourcedb.dbo.TestSource;

If you do not want or cannot switch your context to tempdb database, feel free to use the temporary tables, such as shown on the script below:

use db1;

DROP TABLE IF EXISTS #DestinationTable;

CREATE TABLE #DestinationTable(
	id int not null,
	name char (200),
	lastname char(200),
	logDate datetime );

INSERT INTO #DestinationTable WITH (TABLOCK)
	(id, name, lastname, logdate)
	SELECT id, name, lastname, logDate
		FROM sourcedb.dbo.TestSource;

The execution time results are amazing – we get just 6 seconds for a single-threaded execution and 1 second for the parallel data load process, completely obliterating the other databases execution times (which can still be improved by increasing their data & log file sizes, but never can truly compete with the [tempdb] execution times):

Delayed Durability:

If your workload has a lot of repeatable singular transactions in your data loading process, you should consider enabling delayed durability, that will allow you loading data faster than the transaction log allows. For more information about it, please consult the official Microsoft documentation.

Committing your writes:

The last angle we wanted to approach in this article is grouping and committing your write operations. It might seem to be a rather obvious thing but given that many people still avoid T-SQL transactions as fire, we thought it would be great to remind their usefulness.

Consider a rather simple loading procedure that does not use the grouping of the write operations, while executing them in a loop with, committing each one individually and persisting additional information into the transaction log, thus filling it with unnecessary information (unless each transaction is truly atomic, which must be rare in the repeatable ETL processes):

use db2;

DROP TABLE IF EXISTS dbo.DestinationTable;

CREATE TABLE dbo.DestinationTable(
	id int not null,
	name char (200),
	lastname char(200),
	logDate datetime );

-- Loading 250 Thousand Rows inside as implicit transactions
set nocount on
declare @i as int;
set @i = 1;

while @i <= 250000
begin
	insert into dbo.DestinationTable 
		values (@i, 'First Name', 'LastName', GetDate());

	set @i = @i + 1;
end;

This process on my test SQL MI instance takes a huge time to execute – around 695 seconds to be precise.

The first step in improvement is to investigate the Wait Statistics related to this execution process for analyzing what could have caused the slowness. For this purpose we shall use sys.dm_exec_session_wait_stats DMV, that returns the information about all waits encountered by each session:

SELECT top 5 session_id,wait_type,waiting_tasks_count,wait_time_ms,max_wait_time_ms,signal_wait_time_ms  
	FROM SYS.DM_EXEC_SESSION_WAIT_STATS 
	WHERE session_id = 95 – This is the process id (@@SPID) of the respective session
	ORDER BY wait_time_ms DESC;

It’s clear that our top wait is related to transaction log operations, so why did this happen? Can this be an IOPs problem? Let’s review the historical resource governor metrics to identify if the IOPs were the culprit.

In order to review this you can use sys.dm_resource_governor_workload_groups_history_ex DMV to review the latest metrics inside the default pool in the Resource Governor. You can use this query to get the data:

SELECT snapshot_time, 
   delta_writes_completed, 
   delta_reads_completed 
   FROM SYS.DM_RESOURCE_GOVERNOR_WORKLOAD_GROUPS_HISTORY_EX 
   WHERE name='default' 
   ORDER BY snapshot_time DESC;

In our test case this has produced the following results:

Reimagining this process and putting the WHILE loop inside an explicit transaction will help us a lot – it will take just 7 seconds to execute:

use db2;
GO

DROP TABLE IF EXISTS dbo.DestinationTable;

CREATE TABLE dbo.DestinationTable(
	id int not null,
	name char (200),
	lastname char(200),
	logDate datetime );


-- Loading 250 Thousand Rows inside a single transaction
set nocount on
declare @i as int;
set @i = 1;

begin tran
while @i <= 250000
begin
	insert into dbo.DestinationTable 
		values (@i, 'First Name', 'LastName', GetDate());

	set @i = @i + 1;
end;
commit;

This script takes just 3 seconds to be executed and as you can see on the impacted wait statistics are very different – we have removed all the waits related to transaction log and the top wait was related to [tempdb] file auto-growth. This impact can be lowered further if we have ‘warm-up’ the [tempdb] files by making them grow in advance by forcing [tempdb] database to work at the times where the impact will not be felt by the processes and the final users.

On the table below you can see the comparison between executed operations and the results with the respective dominant wait types:

Operation	Database	Exec Time	Top Wait
Execution using Implicit transactions	User Database	695 Seconds	WRITELOG
Execution using Explicit transaction	User Database	7 Seconds	LOGBUFFER
Execution using Explicit transaction	TempDB Database	3 Seconds	PREEPTIVE_OS_WRITEFILEGATHER

While these examples do not guarantee that your particular workloads will behave in the same way, these are some of the basic techniques that you can apply in order to improve your SQL MI ETL performance.

Updated Jan 18, 2022

Version 1.0

azure sql

azure sql managed instance

NikoNeugebauer

Microsoft

Joined May 17, 2021

View Profile

Azure SQL Blog

Follow this blog board to get notified when there's new activity