Lesson Learned #373: Improving download transfer rate for LOB columns in Azure SQL Database
Published Jun 16 2023 12:25 PM 1,739 Views

A few days ago, I have been working on reducing the data download transfer rate to improve the download ratio for Proxy or Redirect connections. I would like to share my lessons learned with two examples, using the COMPRESS and DECOMPRESS commands of SQL Server. We know, based on, this article: Azure SQL Database connectivity architecture - Azure SQL Database and Azure Synapse Analytics | Micr... the recomendation to use Redirect versus Proxy - "We highly recommend the Redirect connection policy over the Proxy connection policy for the lowest latency and highest throughput". 

 

We can see this difference in LOB fields that contain a lot of information to be downloaded to the connected client. "LOB" stands for "Large Object," which refers to a data type in databases used to store large amounts of binary information, such as images, audio files, videos, or other sizable data. LOB fields are designed to efficiently handle the manipulation and storage of this data in the database.

 

Well, for this reason, I would like to test, compare and review the results. 

 

The first thing, let's try to create a table in a Standard 2 in Azure SQL Database. 

 

 

CREATE TABLE [dbo].[A](
	[id] [int] IDENTITY(1,1) NOT NULL,
	[Data] [varbinary](max) NULL,
PRIMARY KEY CLUSTERED 
(
	[id] ASC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

 

 

 

The above SQL statement creates a table named "A" in the "dbo" schema. The table has the following columns:

  1. [id]: This is an integer column defined as an identity column with an increment of 1 and a seed of 1. .

  2. [Data]: This column is defined as a variable-length binary (varbinary) data type with a maximum size of "max". 

In order to reduce the amount of data transferred we are going to add the data using COMPRESS function of SQL Server. 

 

 

insert into A (DATA) VALUES (COMPRESS(CONVERT(VARBINARY(MAX),REPLICATE(CONVERT(VARCHAR(MAX),'Microsoft'),11000))))

--- Add a number of rows that you prefer
INSERT INTO a (data) select data from a

 

 

I'm going to add 524288 rows.  

The first PowerShell script that I'm going to use is using ExecuteReader for every existing row. 

 

 


Add-Type -TypeDefinition @"
using System;
using System.IO;
using System.IO.Compression;

public class GZipHelper
{
    public static byte[] Decompress(byte[] compressedData)
    {
        using (var compressedStream = new MemoryStream(compressedData))
        using (var decompressedStream = new MemoryStream())
        {
            using (var gzipStream = new GZipStream(compressedStream, CompressionMode.Decompress))
            {
                gzipStream.CopyTo(decompressedStream);
            }

            return decompressedStream.ToArray();
        }
    }
}
"@


$SrvNameConn1 = "SrvUsingProxy.database.windows.net"
$SrvNameConn2 = "SrvUsingRedirect.database.windows.net"

$Database = "dBnAME"
$Username = "User"
$Password = "Pwd" 
$NumberExecutions = 10000

$bRead= $true

cls

try
{
  $sw = [diagnostics.stopwatch]::StartNew()

  $query1 = "SELECT Data FROM A"
  $query2 = "SELECT DESCOMPRESS(Data) FROM A"

  $connectionStringConn1 = "Server=tcp:$SrvNameConn1,1433;Initial Catalog=$Database;Persist Security Info=False;User ID=$Username;Password=$Password;MultipleActiveResultSets=False;Encrypt=True;Connection Timeout=30;Pooling=False;Packet Size=8000;TrustServerCertificate=True;Application Name='Test MS APP#1'"
  $connectionConn1 = New-Object -TypeName System.Data.SqlClient.SqlConnection($connectionStringConn1)
  $connectionConn1.StatisticsEnabled = 1 

  $commandConn1 = New-Object -TypeName System.Data.SqlClient.SqlCommand
  $commandConn1.CommandTimeout = 60
  $commandConn1.Connection=$connectionConn1
  $commandConn1.CommandText = $query1
  $commandConn1.Prepare()

  $connectionStringConn2 = "Server=tcp:$SrvNameConn2,1433;Initial Catalog=$Database;Persist Security Info=False;User ID=$Username;Password=$Password;MultipleActiveResultSets=False;Encrypt=True;Connection Timeout=30;Pooling=False;Packet Size=8000;TrustServerCertificate=True;Application Name='Test MS APP#2'"
  $connectionConn2 = New-Object -TypeName System.Data.SqlClient.SqlConnection($connectionStringConn2)
  $connectionConn2.StatisticsEnabled = 1 
  
  $commandConn2 = New-Object -TypeName System.Data.SqlClient.SqlCommand
  $commandConn2.CommandTimeout = 60
  $commandConn2.Connection=$connectionConn2
  $commandConn2.CommandText = $query2
  $commandConn1.Prepare()

  $lBytesRecdConn1 = 0
  $lBytesRecdConn2 = 0

  $connectionConn1.Open()
  $connectionConn2.Open()
  
  $lNewTime1 = 0
  $lNewTime2 = 0

  $lNewTimeTotal1 = 0
  $lNewTimeTotal2 = 0


  $SqlAdapter = New-Object System.Data.SqlClient.SqlDataAdapter

  write-Output ("Query: " +$query1) 
  write-Output ("Query: " +$query2) 

  for ($i=1; $i -le $NumberExecutions; $i++)
  {
   try
    {
  

     $connectionConn1.ResetStatistics()
     $startConn1 = get-date
       $RedConn1 = $commandConn1.ExecuteReader()
       If($bRead -and $RedConn1.HasRows)
         {      
            $StartReadConn1 = get-date
            while ($RedConn1.Read())
            {
              $Null = [System.Text.Encoding]::UTF8.GetString([GZipHelper]::Decompress($RedConn1.GetSqlBinary(0).Value))
            }  
            $EndReadConn1 = get-date
         }

     $endConn1 = get-date
     $lNewTime1 = (New-TimeSpan -Start $startConn1 -End $endConn1).TotalMilliseconds
     $dataConn1 = $connectionConn1.RetrieveStatistics()
     If( $RedConn1.IsClosed -eq $false) { $RedConn1.Close() }

    
     $lBytesRecdConn1 = $lBytesRecdConn1 + $dataConn1.BytesReceived
  
  
     $connectionConn2.ResetStatistics()
     $startConn2 = get-date
       $RedConn2 = $commandConn2.ExecuteReader()
       If($bRead -and $RedConn2.HasRows)
         {      
            $StartReadConn2 = get-date
            while ($RedConn2.Read())
            {
              $Null = $RedConn2.GetSqlBinary(0).Value
            }  
            $EndReadConn2 = get-date
         }

     $endConn2 = get-date
     $lNewTime2 = (New-TimeSpan -Start $startConn2 -End $endConn2).TotalMilliseconds
     $dataConn2 = $connectionConn2.RetrieveStatistics()
     If( $RedConn2.IsClosed -eq $false) { $RedConn2.Close() }

       write-Output ("-----------------" )
       write-Output ("Details  : Serv1 vs Serv2") 
       write-Output ("Iteration: " +$i) 
       write-Output ("Spent(ms): " +$lNewTime1 + " vs " + $lNewTime2)
       write-Output ("Exec.(ms): " +$dataConn1.ExecutionTime + " vs " +$dataConn2.ExecutionTime) 
       write-Output ("NTW.(ms) : " +$dataConn1.NetworkServerTime + " vs " +$dataConn2.NetworkServerTime) 
       write-Output ("Read.(ms): " +(New-TimeSpan -Start $StartReadConn1 -End $EndReadConn1).TotalMilliseconds + " vs " +(New-TimeSpan -Start $StartReadConn2 -End $EndReadConn2).TotalMilliseconds)
       write-Output ("Rnd.Trips: " +$dataConn1.ServerRoundtrips + " vs " + $dataConn2.ServerRoundtrips) 
       write-Output ("Buf.Recd : " +$dataConn1.BuffersReceived + " vs " + $dataConn2.BuffersReceived) 
       write-Output ("Byt.Recd : " +$dataConn1.BytesReceived + " vs " + $dataConn2.BytesReceived) 
       write-Output ("KB Recd  : " +($dataConn1.BytesReceived/1024) + " vs " + ($dataConn2.BytesReceived/1024))
       write-Output ("AvgRecd  : " +(($lBytesRecdConn1/1024))/$i)

       $lNewTimeTotal1 = $lNewTimeTotal1 + $lNewTime1
       $lNewTimeTotal2 = $lNewTimeTotal2 + $lNewTime2

       write-Output ("Spent(TT): " +$lNewTimeTotal1/$i + " vs " + $lNewTimeTotal2/$i)
       
     }
    catch
   {
    Write-Output -ForegroundColor DarkYellow "Error at execution" 
    Write-Output -ForegroundColor Magenta $Error[0].Exception 
   }
  } 
 }
 catch
 {
    Write-Output -ForegroundColor DarkYellow "Error at connection" 
    Write-Output -ForegroundColor Magenta $Error[0].Exception 
 }
 finally
 {
  $connectionConn1.Close()
  $connectionConn2.Close()
 }
write-Output ("Time spent (ms) Procces :  " +$sw.elapsed) 
write-Output ("Review: https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/provider-statistics-for-sql-server") 

 

 

 

The code is used to measure and compare the download transfer rate for two different SQL Server connections: one using a proxy and another using a redirect. The code also includes a helper class named "GZipHelper" that contains a method for decompressing data using the GZip algorithm.

 

Inside the try-catch block, a stopwatch is started to measure the overall execution time.

 

Two SQL queries are defined: "query1" and "query2". These queries retrieve data from the table "A", where "query2" includes the use of the "DECOMPRESS" function to decompress the "Data" column.

 

Two SQL commands are created, associated with their respective connections, and prepared with the defined queries.

Variables are initialized to keep track of the bytes received for each connection.

 

The connections are opened, and a loop is executed for the specified number of iterations. Inside the loop, the following steps are performed for each iteration:

 

1. The statistics of the first connection are reset, and the start time is recorded.

2. The first query is executed, and if the "bRead" flag is set to true and there are rows returned, a loop is executed to read each row. Within this loop, the "Data" column is decompressed using the "GZipHelper" class and stored in a variable named "$Null". This line can be modified to perform different operations with the decompressed data.

3. The end time is recorded, and various statistics and information about the first connection are retrieved.

4. The statistics of the second connection are reset, and the start time is recorded.

5. The second query is executed, and if the "bRead" flag is true and there are rows returned, a loop is executed to read each row. The "Data" column is retrieved as is, without decompression, and stored in a variable named "$Null". This line can also be modified to perform different operations with the data.

6. The end time is recorded, and various statistics and information about the second connection are retrieved.

7. Detailed output is written to the console, comparing the execution times, network server times, read times, server roundtrips, buffers received, bytes received, and average received bytes per iteration for both connections.

8. The total execution times for both connections are updated.

 

This PowerShell code helps measure and compare the performance of two SQL Server connections and provides insights into the download transfer rates and other performance-related statistics. My lesson Learned: The Proxy connection policy took more time that redirect every time that I call the method read of the DataReader.

 

Other code that reduces the time is using a DataSet. 

 

 


$SrvNameConn1 = "SrvUsingProxy.database.windows.net"
$SrvNameConn2 = "SrvUsingRedirect.database.windows.net"

$Database = "dbName"
$Username = "UserName"
$Password = "Pwd" 
$NumberExecutions = 10000

$bRead= $true

cls

try
{
  $sw = [diagnostics.stopwatch]::StartNew()
  $query = "SET NOCOUNT ON;SELECT Data FROM A"

  $connectionStringConn1 = "Server=tcp:$SrvNameConn1,1433;Initial Catalog=$Database;Persist Security Info=False;User ID=$Username;Password=$Password;MultipleActiveResultSets=False;Encrypt=True;Connection Timeout=30;Pooling=False;Packet Size=8000;TrustServerCertificate=True;Application Name='Test MS APP#1'"
  $connectionConn1 = New-Object -TypeName System.Data.SqlClient.SqlConnection($connectionStringConn1)
  $connectionConn1.StatisticsEnabled = 1 

  $commandConn1 = New-Object -TypeName System.Data.SqlClient.SqlCommand
  $commandConn1.CommandTimeout = 60
  $commandConn1.Connection=$connectionConn1
  $commandConn1.CommandText = $query
  $commandConn1.Prepare()

  $connectionStringConn2 = "Server=tcp:$SrvNameConn2,1433;Initial Catalog=$Database;Persist Security Info=False;User ID=$Username;Password=$Password;MultipleActiveResultSets=False;Encrypt=True;Connection Timeout=30;Pooling=False;Packet Size=8000;TrustServerCertificate=True;Application Name='Test MS APP#2'"
  $connectionConn2 = New-Object -TypeName System.Data.SqlClient.SqlConnection($connectionStringConn2)
  $connectionConn2.StatisticsEnabled = 1 
  
  $commandConn2 = New-Object -TypeName System.Data.SqlClient.SqlCommand
  $commandConn2.CommandTimeout = 60
  $commandConn2.Connection=$connectionConn2
  $commandConn2.CommandText = $query
  $commandConn1.Prepare()

  $lBytesRecdConn1 = 0
  $lBytesRecdConn2 = 0

  $connectionConn1.Open()
  $connectionConn2.Open()
  
  $lNewTime1 = 0
  $lNewTime2 = 0

  $lNewTimeTotal1 = 0
  $lNewTimeTotal2 = 0

  
  $SqlAdapter1 = New-Object System.Data.SqlClient.SqlDataAdapter
  $SqlAdapter2 = New-Object System.Data.SqlClient.SqlDataAdapter

  $DataSet1 = New-Object System.Data.DataSet
  $DataSet2 = New-Object System.Data.DataSet

  $SqlAdapter1.SelectCommand = $commandConn1
  $SqlAdapter2.SelectCommand = $commandConn2


  write-Output ("Query: " +$query) 

  for ($i=1; $i -le $NumberExecutions; $i++)
  {
   try
    {
  

     $connectionConn1.ResetStatistics()
     $startConn1 = get-date
     $Null = $SqlAdapter1.Fill($DataSet1)
     $endConn1 = get-date
     $lNewTime1 = (New-TimeSpan -Start $startConn1 -End $endConn1).TotalMilliseconds
     $dataConn1 = $connectionConn1.RetrieveStatistics()
     $DataSet1.Clear()

    
     $lBytesRecdConn1 = $lBytesRecdConn1 + $dataConn1.BytesReceived
  

  
     $connectionConn2.ResetStatistics()
     $startConn2 = get-date
     $SqlAdapter2.SelectCommand = $commandConn2
     $Null = $SqlAdapter2.Fill($DataSet2)
     $endConn2 = get-date
     $lNewTime2 = (New-TimeSpan -Start $startConn2 -End $endConn2).TotalMilliseconds
     $dataConn2 = $connectionConn2.RetrieveStatistics()
     $DataSet2.Clear()


       write-Output ("-----------------" )
       write-Output ("Details  : Srv1 vs Srv2") 
       write-Output ("Iteration: " +$i) 
       write-Output ("Spent(ms): " +$lNewTime1 + " vs " + $lNewTime2)
       write-Output ("Exec.(ms): " +$dataConn1.ExecutionTime + " vs " +$dataConn2.ExecutionTime) 
       write-Output ("NTW.(ms) : " +$dataConn1.NetworkServerTime + " vs " +$dataConn2.NetworkServerTime) 
       write-Output ("Rnd.Trips: " +$dataConn1.ServerRoundtrips + " vs " + $dataConn2.ServerRoundtrips) 
       write-Output ("Buf.Recd : " +$dataConn1.BuffersReceived + " vs " + $dataConn2.BuffersReceived) 
       write-Output ("Byt.Recd : " +$dataConn1.BytesReceived + " vs " + $dataConn2.BytesReceived) 
       write-Output ("KB Recd  : " +($dataConn1.BytesReceived/1024) + " vs " + ($dataConn2.BytesReceived/1024))
       write-Output ("AvgRecd  : " +(($lBytesRecdConn1/1024))/$i)

       $lNewTimeTotal1 = $lNewTimeTotal1 + $lNewTime1
       $lNewTimeTotal2 = $lNewTimeTotal2 + $lNewTime2

       write-Output ("Spent(TT): " +$lNewTimeTotal1/$i + " vs " + $lNewTimeTotal2/$i)
       
     }
    catch
   {
    Write-Output -ForegroundColor DarkYellow "Error at execution" 
    Write-Output -ForegroundColor Magenta $Error[0].Exception 
   }
  } 
 }
 catch
 {
    Write-Output -ForegroundColor DarkYellow "Error at connection" 
    Write-Output -ForegroundColor Magenta $Error[0].Exception 
 }
 finally
 {
  $connectionConn1.Close()
  $connectionConn2.Close()
 }
write-Output ("Time spent (ms) Procces :  " +$sw.elapsed) 
write-Output ("Review: https://docs.microsoft.com/en-us/dotnet/framework/data/adonet/sql/provider-statistics-for-sql-server") 

 

 

 

The main different with previous PowerShell Script is using DataSet to download all the rows at the same time in compressed mode reducing the latency. 

 

Once I have the data downloaded and COMPRESS function COMPRESS (Transact-SQL) - SQL Server | Microsoft Learn "compresses the input expression, using the Gzip algorithm. The function returns a byte array of type varbinary(max)." basically I need to call the GZipHelper to uncompress the data in my application reducing latency in network and space in my database."

 

Enjoy!

1 Comment
Version history
Last update:
‎Jun 16 2023 12:29 PM
Updated by: