Azure SQL Blog

9 MIN READ

Improving data loading performance On SQL Managed Instance with Business Critical service tier

Microsoft

Feb 02, 2022

In this blog post we shall consider some of the strategies for improving data loading performance in Azure SQL Managed Instance running on Business Critical service tier. These strategies apply to the repeatable ETL processes, meaning that if there is a problem with the process or the data itself, data loading can be repeated without loss of any bit of data or consistency. Of course, these strategies do require a prepared design for the possibility of repetition, but this is a basic requirement for any data loading process.

This post is focused on Business Critical service tier and serves as a continuation of the first post in the series that was focused on Improving data loading performance On SQL Managed Instance with General Purpose service tier.

If you are interested in other posts on how-to discover different aspects of SQL MI - please visit the http://aka.ms/sqlmi-howto, which serves as a place holder for the series.

Setup:

For the test case I have created a SQL Managed Instance with Business Critical service tier, 8 CPU vCores and a 1 TB of reserved space. I have created a simple new database db2 with all the default settings by using simple CREATE DATABASE command. Given that the local SSD storage will not give better performance depending on the size of the data & log file, I decided to update them both to 10 GB, as to avoid any potential impact, but did not went a step further.

CREATE DATABASE [testdb];
ALTER DATABASE [testdb] MODIFY FILE ( NAME = N'data_0', SIZE = 10GB );
ALTER DATABASE [testdb] MODIFY FILE ( NAME = N'log', SIZE = 10GB );

By using same table schema and data that were used for the previous article, we shall load 2 million rows with the following code, that will ensure creation of a new table [TestSource] table with a primitive test data structure:

DROP TABLE IF EXISTS dbo.TestSource;

CREATE TABLE dbo.TestSource(
	id int not null,
	name char (200),
	lastname char(200),
	logDate datetime );

-- Loading Rows 
set nocount on
declare @i as int;
set @i = 1;

BEGIN TRAN
WHILE @i <= 2000000
BEGIN
	INSERT INTO dbo.TestSource 
		VALUES (@i, 'First Name', 'LastName', GetDate());

	SET @i = @i + 1;
END;

A very important difference between Business Critical & General Purpose service tiers is the support of the In-Memory tables on the Business Critical service tier. This technology permits to improve blocking for the OLTP workloads by eliminating times waiting for locks & latches. Besides being very useful for the OLTP workloads, In-Memory can deliver impactful improvements for some of the ETL processes.

In-Memory technology allows building 2 types of tables – schema and data tables and schema-only tables. While schema and data tables allow persistent storage of the schema and data, the schema-only tables will not persist the data after failover or instance restart. Schema-only tables can be compared by their functionality with the temporary tables, where the important difference is that schema-only tables do not even have to store the information written into them on the disk, thus theoretically giving them an advantage over the temporary tables which still needs to be persisted in [tempdb] database.

Additionally, In-Memory tables can use 2 types of nonclustered indexes – Hash and BW-Tree, and each of them has their own usage scenarios and advantages. Nonclustered Hash Indexes are largely targeting individual lookups, while BW-Tree nonclusered indexes are great for the situations when the queries executed against them are expected to return a range of values, resulting in a ranged-scan of the respective table.

Nonclustered Hash Indexes allow the specification of the buckets that they will use, and for the purposes of the performance we shall be considering different values as well. Notice that for the detailed recommendations on sizing the hash buckets, please consult the respective documentation.

For the test and comparison, we shall create 4 destination tables, using In-Memory technology, which will be used as the distinct destinations:

[InMemTableHash] – an In-Memory table that uses Hash Indexes and that will use 300.000 buckets
[InMemTableHash_SchemaOnly] - an In-Memory Schema-Only table that uses Hash Indexes that will use 300.000 buckets
[InMemTable] – an In-Memory table that uses BW-Tree Indexes
[InMemTable_SchemaOnly] - an In-Memory Schema-Onlytable that uses BW-Tree Indexes

IF EXISTS (SELECT NAME FROM sys.objects WHERE NAME = 'InMemTableHash')
   DROP TABLE [dbo].[InMemTableHash]
GO

IF EXISTS (SELECT NAME FROM sys.objects  WHERE NAME = 'InMemTableHash_SchemaOnly')
   DROP TABLE [dbo].[InMemTableHash_SchemaOnly]
GO

IF EXISTS (SELECT NAME FROM sys.objects WHERE NAME = 'InMemTable')
   DROP TABLE [dbo].[InMemTable]
GO

IF EXISTS (SELECT NAME FROM sys.objects  WHERE NAME = 'InMemTable_SchemaOnly')
   DROP TABLE [dbo].[InMemTable_SchemaOnly]
GO

-- Create a memory-optimized Schema-only table with the Hash index 
CREATE TABLE [dbo].[InMemTableHash] (
	id int not null  PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT=300000),
	name char (200),
	lastname char(200),
	logDate datetime 
) WITH (MEMORY_OPTIMIZED=ON, DURABILITY = SCHEMA_AND_DATA);
GO

-- Create a 2nd memory-optimized Schema-only table with the Hash index
CREATE TABLE [dbo].[InMemTableHash_SchemaOnly] (
	id int not null  PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT=300000),
	name char (200),
	lastname char(200),
	logDate datetime 
) WITH (MEMORY_OPTIMIZED=ON, DURABILITY = SCHEMA_ONLY);
GO

-- Create a memory-optimized table with the Nonclustered BW-Tree index 
CREATE TABLE [dbo].[InMemTable] (
	id int not null  PRIMARY KEY NONCLUSTERED,
	name char (200),
	lastname char(200),
	logDate datetime 
) WITH (MEMORY_OPTIMIZED=ON, DURABILITY = SCHEMA_AND_DATA);
GO

-- Create a 2nd memory-optimized Schema-only table with the Nonclustered BW-Tree index
CREATE TABLE [dbo].[InMemTable_SchemaOnly] (
	id int not null  PRIMARY KEY NONCLUSTERED,
	name char (200),
	lastname char(200),
	logDate datetime 
) WITH (MEMORY_OPTIMIZED=ON, DURABILITY = SCHEMA_ONLY);
GO

Performance Tests:

Let’s start with the same test that we did for the General Purpose service tier – by loading the data into the regular disk-based table and measuring its performance:

TRUNCATE TABLE dbo.DiskTable;
INSERT INTO dbo.DiskTable WITH (TABLOCK)
	(id, name, lastname, logdate)
	SELECT id, name, lastname, logDate
		FROM dbo.TestSource;

On this test SQL Managed Instance, it takes around 24 seconds to execute this task, which is a great improvement when compared to 66 seconds obtained on the General Purpose service tier SQL Managed Instance. This amount of time is still incomparably far from what we have managed to achieve with the help of temporal tables.

Let’s see what kind of performance an In-memory table can bring:

DELETE FROM dbo.InMemTableHash;

INSERT INTO dbo.InMemTableHash
	SELECT id, name, lastname, logDate
		FROM dbo.TestSource;

The execution plan for the insertion of the data is presented below, and you can see that it is not using parallelism:

And if you dare to look inside the execution plan, the reason is quite clear that such DML operations do not support parallelism:

The execution time for such query will be around comparable 25 seconds, even though the query will run in a single threaded fashion.

The same data loading technique applied to the Hash table that is Schema-only and does not need persistence on the disk results in a significant improvement, lowering the execution time to just 6 seconds:

-- Hash Schema-Only table
DELETE FROM dbo.InMemTableHash_SchemaOnly;

INSERT INTO dbo.InMemTableHash_SchemaOnly
	SELECT id, name, lastname, logDate
		FROM dbo.TestSource;

This results in 4 times performance improvement, compared to the original, but can we get even a better result with the Nonclustered Bw-tree index, that does not face hash collisions and for reading can bring significant improvement:

DELETE FROM dbo.InMemTable;
INSERT INTO dbo.InMemTable
	SELECT id, name, lastname, logDate
		FROM dbo.TestSource;

The real result was different to the initial expectations – balancing all the insertions into the primary key in the Bw-tree is not a fast task – it took a whooping 30 seconds to complete it.
On the image below you can see the wait stats for the actual execution plan and notice that the memory allocation is the element which caused more than 50% of the wait time - around 18 seconds:

Running the same data insertion for the Schema-only table with the nonclustered Bw-tree index gives a tiny improvement – we are lowering execution time to 5.3 seconds, which is still far from the desired but it is the best result we were able to get thus far:

-- Nonclustered Bw-Tree Schema Only
DELETE FROM dbo.InMemTable_SchemaOnly;

INSERT INTO dbo.InMemTable_SchemaOnly 
	SELECT id, name, lastname, logDate
		FROM dbo.TestSource;

Here is the comparison of the results achieved so far, and you can clearly see that the Schema-only tables are the way to go if you are looking for the performance, while accepting that an eventual failover will make the data used in this table disappear, like in the case of a temporary table:

In the case of the Hash table, we have tried to load data into a table with fewer buckets then the rows that have been inserted and that resulted in a slow performance, but what if we size it correctly with double the amount of hash buckets, relatively to the number of rows – by recreating the same tables with 4 million buckets this time:

-- Ok, let's fix the things with the right number of hash buckets
IF EXISTS (SELECT NAME FROM sys.objects WHERE NAME = 'InMemTableHash')
   DROP TABLE [dbo].[InMemTableHash]
GO

IF EXISTS (SELECT NAME FROM sys.objects  WHERE NAME = 'InMemTableHash_SchemaOnly')
   DROP TABLE [dbo].[InMemTableHash_SchemaOnly]
GO

-- Create 
CREATE TABLE [dbo].[InMemTableHash] (
	id int not null  PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT=4000000),
	name char (200),
	lastname char(200),
	logDate datetime 
) WITH (MEMORY_OPTIMIZED=ON, DURABILITY = SCHEMA_AND_DATA);
GO

-- Create a 2nd memory-optimized table.
CREATE TABLE [dbo].[InMemTableHash_SchemaOnly] (
	id int not null  PRIMARY KEY NONCLUSTERED HASH WITH (BUCKET_COUNT=4000000),
	name char (200),
	lastname char(200),
	logDate datetime 
) WITH (MEMORY_OPTIMIZED=ON, DURABILITY = SCHEMA_ONLY);
GO

Executing this process will bring some improvements, which are not very significant:

DELETE FROM dbo.InMemTableHash;
INSERT INTO dbo.InMemTableHash
	SELECT id, name, lastname, logDate
		FROM dbo.TestSource;

DELETE FROM dbo.InMemTableHash_SchemaOnly;
INSERT INTO dbo.InMemTableHash_SchemaOnly
	SELECT id, name, lastname, logDate
		FROM dbo.TestSource;

The first query for the table with hash index will result in largely the very same 25 seconds of execution time, while the Schema-only table with hash index will get an improvement to 4.7 seconds meaning 0.5 seconds improvement.

Given that all the storage for the Business Critical (BC) service tier is a locally attached SSD, there is no way that we can get the same or even significantly improved performance by using the same tricks with the [tempdb], but I thought that we need to do the test to see how much improvement the technique of using a SIMPLE recovery and Delayed Durability can give.

If you are loading data into the temporary tables on the SQL Managed Instance running Business Critical service tier, depending on the size of the data, you might notice some improvements in comparison with General Purpose service tier, since the read data is closer and the disk performance is faster for the reading part, but this is where the performance improvement will largely end.

Running the same workload as we did on the General Purpose service tier with the temporary table as the destination results in incredible 0.7 seconds execution time.

SET STATISTICS TIME ON
DROP TABLE IF EXISTS #DestinationTable;

CREATE TABLE #DestinationTable(
	id int not null,
	name char (200),
	lastname char(200),
	logDate datetime );

INSERT INTO #DestinationTable WITH (TABLOCK)
	(id, name, lastname, logdate)
	SELECT id, name, lastname, logDate
		FROM dbo.TestSource;

On the final image below you can see the comparison of all tested methods and where the best performance to be found. I would stress however the value of remembering that any inserts into Memory-Optimized tables are done in a single-threaded fashion, making them a perfect candidate for the more complex but at the same time more rewarding splitting of the insertion between the multiple processes/threads, allowing you to lower, for example for Schema-Only to those 4.7 seconds into 0.6 seconds by the virtue of parallel processes, thus getting values even below the 0.7 seconds that are achieved by the temporary table usage.

For testing purposes you can run a rather simple experiment by selecting just 1/8 of the data and loading it into the In-memory hash Schema-only table:

DELETE FROM dbo.InMemTableHash_SchemaOnly;
INSERT INTO dbo.InMemTableHash_SchemaOnly
	SELECT TOP 250000 id, name, lastname, logDate
		FROM dbo.TestSource;

Also notice that reading a Schema-only table data has it's own advantages, that besides others contain no disk access at all, but at the same being limited to the amount of available memory of your Managed Instance.

On the Business Critical service tier you have one more advantage for loading data fast - the In-Memory tables, which with a right planning and execution can give you great performance while avoiding overloading disk & log throughput.

Published Feb 02, 2022

Version 1.0

azure sql

azure sql managed instance

NikoNeugebauer

Microsoft

Joined May 17, 2021

View Profile

Azure SQL Blog

Follow this blog board to get notified when there's new activity