Today, I worked on a service request that our customer wants to download a large resultset from Azure SQL Database to OnPremises. The table has several blob columns (XML,Varchar(max),text) and millions of rows. In this situation, I would like to share with you several tests that I did and how to reduce the download time spent.
Initial points
- Try to increase the packet size in your connection string to higher values.
- Instead to use Proxy connection policy use Redirection connection policy to improve the connection.
- About the redirection, remember to use the latest drivers because some old drivers are not able to use redirection.
- As this process is a pure data processing, if possible, try to use Premium or Business Critical to reduce the I/O latency.
- In OnPremises try to distribute the data and log files in different location to improve the IO.
In Azure SQL Database, I created a table and filling the data:
- Basically, I created the following table:
CREATE TABLE [dbo].[Destination](
[ID] [int] IDENTITY(1,1) NOT NULL,
[Name1] [varchar](4000) NULL,
[Name2] [varchar](4000) NULL,
[Name3] [varchar](4000) NULL,
[Name4] [varchar](4000) NULL,
[Name5] [varchar](4000) NULL,
PRIMARY KEY CLUSTERED
(
[ID] ASC
))
- Running multiple times the following query, I got around 7 millions of rows.
INSERT INTO Destination (Name1,Name2,Name3,Name4,Name5) values(Replicate('X',4000),Replicate('X',4000),Replicate('X',4000),Replicate('X',4000),Replicate('X',4000))
INSERT INTO DESTINATION (Name1,Name2,Name3,Name4,Name5) SELECT Name1,Name2,Name3,Name4,Name5 FROM DESTINATION
In OnPremise:
- I developed a small C# aplication that has 3 different process:
- The first process was to read the whole table from Azure SQL Database and using bulkcopy download the data, but the spent time was high. I saw that transfer ratio was about (100-200 mb/s).
private void LoadDataReaderWithoutCompression(C.SqlDataReader newProducts, int lCutOver = 10000, string sDestinationTable = "Destination")
{
using (C.SqlBulkCopy bulkCopy = new C.SqlBulkCopy(GetConnectionStringTarget(0), C.SqlBulkCopyOptions.KeepIdentity | C.SqlBulkCopyOptions.KeepNulls | C.SqlBulkCopyOptions.TableLock))
{
bulkCopy.DestinationTableName = sDestinationTable;
try
{
bulkCopy.BulkCopyTimeout = 6000;
bulkCopy.SqlRowsCopied += new C.SqlRowsCopiedEventHandler(OnSqlRowsCopied);
bulkCopy.NotifyAfter = 2000;
bulkCopy.EnableStreaming = false;
bulkCopy.BatchSize = lCutOver;
bulkCopy.WriteToServer(newProducts);
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
}
}
- The second process was to compress the data using COMPRESS function before downloading the data. Basically, the idea was:
- Create a table with the following structure.
- Execute the query INSERT INTO [_M$_Destination_X] (ID,NAME1,NAME2,NAME3,NAME4,NAME5) SELECT ID,COMPRESS(NAME1) AS NAME1,COMPRESS(NAME2) AS NAME2,COMPRESS(NAME3) AS NAME3, COMPRESS(NAME4) AS NAME4, COMPRESS(NAME5) AS NAME5 FROM Destination
- Download using bulkcopy the compressed data
- Uncompress the data in the destination, running the following TSQ:, INSERT INTO [Destination] (ID,NAME1,NAME2,NAME3,NAME4,NAME5) SELECT ID,DECOMPRESS(NAME1) AS NAME1,DECOMPRESS(NAME2) AS NAME2,DECOMPRESS(NAME3) AS NAME3, DECOMPRESS(NAME4) AS NAME4, DECOMPRESS(NAME5) AS NAME5 FROM [_M$_Destination_X]
CREATE TABLE [dbo].[_M$_Destination_X](
[ID] [int] NOT NULL,
[Name1] [varbinary](max) NULL,
[Name2] [varbinary](max) NULL,
[Name3] [varbinary](max) NULL,
[Name4] [varbinary](max) NULL,
[Name5] [varbinary](max) NULL,
PRIMARY KEY CLUSTERED ( [ID] ASC ))
- The second execution process was very good because I was able to skip the networking issue compressing and de-compressing. But, was only a thread running, what happening if I have millions and millions of rows, well, in this situation, I modified the source using a configurable number of threads (for example, running in parallel 10 threads reading 150000 rows each one).
- Every process read 150000 rows, using the following TSQL: INSERT INTO [_Tmp100] (ID,NAME1,NAME2,NAME3,NAME4,NAME5) SELECT ID,COMPRESS(NAME1) AS NAME1,COMPRESS(NAME2) AS NAME2,COMPRESS(NAME3) AS NAME3, COMPRESS(NAME4) AS NAME4, COMPRESS(NAME5) AS NAME5 FROM Destination ORDER BY ID OFFSET 0 ROWS FETCH NEXT 150000 ROWS ONLY
- Using bulkcopy I transferred the data to the OnPremise service.
- Finally running the query I was able to uncompress the data and save in the destination table, INSERT INTO [Destination] (ID,NAME1,NAME2,NAME3,NAME4,NAME5) SELECT ID,DECOMPRESS(NAME1) AS NAME1,DECOMPRESS(NAME2) AS NAME2,DECOMPRESS(NAME3) AS NAME3, DECOMPRESS(NAME4) AS NAME4, DECOMPRESS(NAME5) AS NAME5 FROM [_Tmp100]
At the end, I was able to reduce the time spent in hours for this process. I got other lessons learned for OnPremises to speed up the process and reduce the PageIOLatch contention, but, this will be for another post.
Enjoy!
Updated Feb 26, 2021
Version 1.0Jose_Manuel_Jurado
Microsoft
Joined November 29, 2018
Azure Database Support Blog
Follow this blog board to get notified when there's new activity