Lesson Learned #377:Running Multiple Instances of SqlBulkCopy in Parallel from PowerShell
Published Jun 21 2023 04:05 AM 1,868 Views

Today, we encountered an interesting service request of attempting to reduce the load times for 100,000 records from a table with 97 varchar(320) fields in an Azure SQL HyperScale database. Following, I would like to share my lessons learned here.

 

The idea is to split in different concurrent process the execution of multiples SqlBulkCopy. In this case, we are going to split this process in 5 processes running in parallel inserting 20,000 rows, let's try to know the total size. 

 

Amount of data

 

If we consider 97 fields with a data type of char(320) and one field of type integer, the total size in bytes for one record would be:

 

  • Size per record = (97 * 320) bytes + 4 bytes = 31,040 bytes + 4 bytes = 31,044 bytes
  • For 20,000 records, the total size in bytes would be:
    • Total size in bytes = 31,044 bytes * 20,000 = 620,880,000 bytes
    • To convert it to megabytes, we divide the size in bytes by 1,048,576 (1 megabyte = 1,048,576 bytes):
    • Total size in megabytes = 620,880,000 bytes / 1,048,576 = 591.58 megabytes
    • Therefore, the total size would be approximately 591.58 megabytes.

 

If we consider 97 fields with a data type of char(320) and one field of type integer, the total size in bytes for 20,000 records would be approximately 591.58 megabytes per process.

 

DoneThreads.PS1

 

The first script will be DoneThreads.PS1. This script performs parallel execution of multiple instances of a script called "FinalFile.ps1" using background jobs. The script includes a function called "logMsgParallel" for logging operations and a function called "GiveID" to manage the execution of background jobs.

 

Here is a detailed explanation of the script:

 

  • 1. The "logMsgParallel" function is defined, which takes several parameters including the message to be logged, color, display options, and newline settings. It logs messages to the console with different colors based on the provided parameters.
  • 2. The "GiveID" function is defined, which takes parameters for the maximum number of jobs and the number of executed jobs. It retrieves the status of background jobs and determines if any jobs have finished executing. It returns true if the maximum number of jobs running concurrently has been reached.
  • 3. The main part of the script begins with clearing the console and initializing variables.
  • 4. The variable "$integerNumberOfConcurrentTasks" is set to 5, indicating the maximum number of concurrent tasks to run.
  • 5. The variable "$HowManyAtSameTime" is set to 5, indicating the number of tasks to start simultaneously.
  • 6. Any previous background jobs running in the current PowerShell session are stopped and removed.
  • 7. Information about user parameters, such as the number of concurrent jobs, is logged using the "logMsgParallel" function.
  • 8. The script enters a loop where it checks if the maximum number of concurrent tasks has been reached using the "GiveID" function. If the limit has not been reached, a background job is started using the "Start-Job" cmdlet, which runs the "FinalFile.ps1" script.
  • 9. The loop continues until the maximum number of concurrent tasks is reached. If the limit is reached, the script waits for 1 second before checking again.
  • 10. Error handling is implemented to log any errors that occur during the execution of the script.
  • This script provides a framework for running multiple instances of the "FinalFile.ps1" script concurrently using background jobs and logging the operations.

 

 


#--------------------------------
#Log the operations of this script.
#--------------------------------
function logMsgParallel
{
    Param
    (
         [Parameter(Mandatory=$false, Position=0)]
         [string] $msg,
         [Parameter(Mandatory=$false, Position=1)]
         [int] $Color,
         [Parameter(Mandatory=$false, Position=2)]
         [boolean] $Show=$true,
         [Parameter(Mandatory=$false, Position=3)]
         [boolean] $ShowDate=$true,
         [Parameter(Mandatory=$false, Position=4)]
         [boolean] $NewLine=$true 
 
    )
  try
   {
    If(TestEmpty($msg))
    {
     $msg = " "
    }

    if($ShowDate -eq $true)
    {
      $Fecha = Get-Date -format "yyyy-MM-dd HH:mm:ss"
    }
    $msg = $Fecha + " " + $msg

    $Colores="White"

    If($Color -eq 1 )
     {
      $Colores ="Cyan"
     }
    If($Color -eq 3 )
     {
      $Colores ="Yellow"
     }
    If($Color -eq 4 )
     {
      $Colores ="Green"
     }
    If($Color -eq 5 )
     {
      $Colores ="Magenta"
     }

     if($Color -eq 2 -And $Show -eq $true)
      {
         if($NewLine)
         {
           Write-Host -ForegroundColor White -BackgroundColor Red $msg 
         }
         else
         {
          Write-Host -ForegroundColor White -BackgroundColor Red $msg -NoNewline
         }
      } 
     else 
      {
       if($Show -eq $true)
       {
        if($NewLine)
         {
           Write-Host -ForegroundColor $Colores $msg 
         }
        else
         {
           Write-Host -ForegroundColor $Colores $msg -NoNewline
         }  
       }
      } 

   }
  catch
  {
    Write-Host $msg 
  }
}

#--------------------------------------------------------------------------
#Obtain the status for every BackgroundJob and execute the remaining ones
#--------------------------------------------------------------------------

Function GiveID(
              [Parameter(Mandatory=$false)] [int]$lMax,
              [Parameter(Mandatory=$false)] [int]$lExecuted)
{ 
 try
 {

  [int]$lPending=0
  [string]$Msg=""

  $Jobs = Get-Job ##Obtain the jobs that PowerShell is executing.

  if($lMax -eq -1 ) ##if we don't have the number of jobs to run the default will be the number of client machine processor.
  {
   $lMax = [int]$env:NUMBER_OF_PROCESSORS
  }

  ##----------------------------------------------
  ##For every job we're going to know the current status and if the process finishes.In case that any job finishes we are going 
  ##to run a new one until the maximum number of loops
  ##----------------------------------------------
  
  ForEach ($di in $Jobs)                         
  {
    if($di.State -eq "Running")
    {
      $lPending=$lPending+1
    }
    $Msg=$("Checking..." + $di.Name + " (" + $di.State + ") of total concurrent " + $lMax.ToString() + " jobs - Executed Already - " + $lExecuted.ToString())
    logMsgParallel -msg $Msg (4)
    ##$Data = Receive-Job $di.Id - if we need to know what is the outcome of the job, uses Receive-Job...
  } 
  if($lPending -lt $lMax) 
    {return $true} ##We reached the maximum of allowed jobs running at the same time.
  
  return {$false}
 }
 catch
  {
 
  }
}


try
{
 clear

 $i=0;

 $integerNumberOfConcurrentTasks = 5
 $HowManyAtSameTime = 5

 $Jobs = Get-Job ##Obtain all the process under this main session of powershell
 Foreach ($di in $Jobs) ##We need to close and kill other previous process.
 {
   logMsgParallel $('Stopping previous scenario: ' + $di.Name) (3)
   Stop-Job $di.Id
   logMsgParallel $('Removing previous scenario: ' + $di.Name) (3)
   Remove-Job $di.Id
 } 

 logMsgParallel $('-------------- User Parameters -----------') (3)
 logMsgParallel $('Concurrent Jobs:              ' + $integerNumberOfConcurrentTasks.ToString()) (3)
 logMsgParallel $('------------------------------------------') (3)

 while ($i -lt $integerNumberOfConcurrentTasks)
 {
  if((Giveid $HowManyAtSameTime $i) -eq $true) ##How many process do you want to run at the same time.
  {
   logMsgParallel $("Starting up the scenario: " ) (1)
    $Null = Start-Job -FilePath "C:\SourceCode\FinalFile.ps1" 
   logMsgParallel $("Started the scenario: ---" + $NameofApp ) (1)
   $i=$i+1;
  }
  else
  {
    logMsgParallel ("Limit of concurrent process reached. Waiting for completion in 5 seconds") (3)
    Start-sleep -Seconds 1
  }
 }
}
 catch
   {
    logMsgParallel("Error executing this process..." + $Error[0].Exception) (2)
   }

 

 

FinalFile PowerShell Script

 

The second script will be FinalFile.PS1,  that connects to a SQL Server database and performs bulk data insertion using the SqlBulkCopy class.

Here is a detailed explanation of the script:

 

  • 1. The script starts by defining several variables that store connection information, such as the server name, database name, username, and password. There is also a constant text string used later in the script.
  • 2. The "logMsg" function is defined, which is used for logging messages. It takes a mandatory message parameter and an optional InfoDate parameter to include the timestamp in the log output.
  • 3. The connection string for the SQL Server database is constructed using the provided connection information.
  • 4. A new SqlConnection object is created using the connection string, and the connection is opened using the "Open" method.
  • 5. An instance of the SqlBulkCopy class is created using the SqlConnection object. The "EnableStreaming" property is set to "true" to enable streaming mode. The "DestinationTableName" property is set to "TempTable", specifying the destination table for the bulk copy. Other properties, such as "BatchSize" and "BulkCopyTimeout," are also set to control the bulk copy operation.
  • 6. Logging is performed to indicate that columns are being created.
  • 7. A new DataTable object is created to hold the data to be bulk copied. The script generates columns dynamically using a loop. The first column is an integer column with a default value of 1. The subsequent columns are created as string columns with a maximum length of 320 characters and a default value of the constant text string.
  • 8. Another loop is used to add rows to the DataTable. In this case, the script adds 20,000 rows, each with the previously defined columns and their default values.
  • 9. The script measures the time taken to create columns and save a template row.
  • 10. The script then enters another loop, where it performs the bulk copy operation. In this case, the loop is executed only once. Logging is performed to indicate the start of the process. The time is measured before and after the WriteToServer method is called to copy the data from the DataTable to the destination table using bulk copy. The duration of the process is logged.
  • 11. The script measures the total time taken to add rows using bulk copy.
  • 12. The connection to the database is closed.

The script primarily focuses on the bulk data insertion process using SqlBulkCopy, dynamically creating columns and populating rows in a DataTable, and logging relevant information at different stages of the process.

 

 

 $DatabaseServerWrite = "tcp:ServerName.database.windows.net,1433"
 $DatabaseWrite = "DbName"
 $UsernameWrite = "UserName"
 $PasswordWrite = "Password" 
 $ConstText = "This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test. This is a test."

function logMsg
{
    Param
    (
         [Parameter(Mandatory=$true, Position=0)]
         [string] $msg = "",
         [Parameter(Mandatory=$false, Position=2)] 
         [int] $InfoDate = 0

    )
  try
   {
    If($InfoDate -eq 0) { Write-Output $msg }
    else {Write-Output ((Get-Date -format "yyyy-MM-dd HH:mm:ss") + $msg) } 
   }
  catch
  {
    Write-Output $msg 
  }
}


cls


  $connectionStringWrite = "Server=$DatabaseServerWrite;Initial Catalog=$DatabaseWrite;Persist Security Info=False;User ID=$UsernameWrite;Password=$PasswordWrite;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Connection Timeout=30;Packet Size=32000;Pooling=True;Application Name=Test-Write"

  $connectionWrite = New-Object -TypeName System.Data.SqlClient.SqlConnection($connectionStringWrite)
  $connectionWrite.Open()

  ##$SqlBulkCopy = New-Object -TypeName System.Data.SqlClient.SqlBulkCopy($connectionWrite, ([System.Data.SqlClient.SqlBulkCopyOptions]::TableLock -bor [System.Data.SqlClient.SqlBulkCopyOptions]::UseInternalTransaction), $null)
 
  $SqlBulkCopy = New-Object -TypeName System.Data.SqlClient.SqlBulkCopy($connectionWrite,([System.Data.SqlClient.SqlBulkCopyOptions]::UseInternalTransaction), $null)
  $SqlBulkCopy.EnableStreaming = $true
  $SqlBulkCopy.DestinationTableName = "TempTable"
  $SqlBulkCopy.BatchSize = 5000
  $SqlBulkCopy.BulkCopyTimeout = 30000
  
  logMsg ("Creating columns...")
  # Create the datatable, and autogenerate the columns.
  $datatable = New-Object System.Data.DataTable
  
  $start = get-date  

  $column = New-Object System.Data.DataColumn("N", [int])
  $column.DefaultValue = 1
  $datatable.Columns.Clear()
  $datatable.Columns.Add($column)

  for($i=4;$i -le 100;$i++) 
  { 
    $columnName = "COL$i"
    $column = New-Object System.Data.DataColumn($columnName, [string])
    $column.MaxLength = 320
    $column.DefaultValue = $ConstText
    $datatable.Columns.Add($column)
  } 

  for($RowCounter=1;$RowCounter -le 20000;$RowCounter++)
  {
    $row = $datatable.NewRow()   
    #$row[0]=$RowCounter
    #for($c=4;$c -le 97;$c++)
    #{
    #   $row[($c-3)] = $ConstText
    #}      
    $datatable.Rows.Add($row)
  }

  $end = get-date
  logMsg ("Created columns and save template row (ms): " + (New-TimeSpan -Start $start -End $end).TotalMilliseconds.ToString())
    
  $start = get-date
  for($i=1;$i -le 1;$i++)
  {
    logMsg ("Starting Process: " + $i)
    $startProcess = get-date
      $SqlBulkCopy.WriteToServer($datatable)
    $endProcess = get-date
    logMsg ("Finished Process : " + (New-TimeSpan -Start $startProcess -End $endProcess).TotalMilliseconds.ToString() )
  }
  $end = get-date
  logMsg ("Rows added (ms): " + (New-TimeSpan -Start $start -End $end).TotalMilliseconds.ToString() )
  $connectionWrite.Close()

 

 

Table Definition

 

IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[TempTable]') AND type in (N'U'))
DROP TABLE [TempTable]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE TABLE [TempTable](
	[N] [int] not NULL,
	[COL4] [varchar](320) NULL,
	[COL5] [varchar](320) NULL,
	[COL6] [varchar](320) NULL,
	[COL7] [varchar](320) NULL,
	[COL8] [varchar](320) NULL,
	[COL9] [varchar](320) NULL,
	[COL10] [varchar](320) NULL,
	[COL11] [varchar](320) NULL,
	[COL12] [varchar](320) NULL,
	[COL13] [varchar](320) NULL,
	[COL14] [varchar](320) NULL,
	[COL15] [varchar](320) NULL,
	[COL16] [varchar](320) NULL,
	[COL17] [varchar](320) NULL,
	[COL18] [varchar](320) NULL,
	[COL19] [varchar](320) NULL,
	[COL20] [varchar](320) NULL,
	[COL21] [varchar](320) NULL,
	[COL22] [varchar](320) NULL,
	[COL23] [varchar](320) NULL,
	[COL24] [varchar](320) NULL,
	[COL25] [varchar](320) NULL,
	[COL26] [varchar](320) NULL,
	[COL27] [varchar](320) NULL,
	[COL28] [varchar](320) NULL,
	[COL29] [varchar](320) NULL,
	[COL30] [varchar](320) NULL,
	[COL31] [varchar](320) NULL,
	[COL32] [varchar](320) NULL,
	[COL33] [varchar](320) NULL,
	[COL34] [varchar](320) NULL,
	[COL35] [varchar](320) NULL,
	[COL36] [varchar](320) NULL,
	[COL37] [varchar](320) NULL,
	[COL38] [varchar](320) NULL,
	[COL39] [varchar](320) NULL,
	[COL40] [varchar](320) NULL,
	[COL41] [varchar](320) NULL,
	[COL42] [varchar](320) NULL,
	[COL43] [varchar](320) NULL,
	[COL44] [varchar](320) NULL,
	[COL45] [varchar](320) NULL,
	[COL46] [varchar](320) NULL,
	[COL47] [varchar](320) NULL,
	[COL48] [varchar](320) NULL,
	[COL49] [varchar](320) NULL,
	[COL50] [varchar](320) NULL,
	[COL51] [varchar](320) NULL,
	[COL52] [varchar](320) NULL,
	[COL53] [varchar](320) NULL,
	[COL54] [varchar](320) NULL,
	[COL55] [varchar](320) NULL,
	[COL56] [varchar](320) NULL,
	[COL57] [varchar](320) NULL,
	[COL58] [varchar](320) NULL,
	[COL59] [varchar](320) NULL,
	[COL60] [varchar](320) NULL,
	[COL61] [varchar](320) NULL,
	[COL62] [varchar](320) NULL,
	[COL63] [varchar](320) NULL,
	[COL64] [varchar](320) NULL,
	[COL65] [varchar](320) NULL,
	[COL66] [varchar](320) NULL,
	[COL67] [varchar](320) NULL,
	[COL68] [varchar](320) NULL,
	[COL69] [varchar](320) NULL,
	[COL70] [varchar](320) NULL,
	[COL71] [varchar](320) NULL,
	[COL72] [varchar](320) NULL,
	[COL73] [varchar](320) NULL,
	[COL74] [varchar](320) NULL,
	[COL75] [varchar](320) NULL,
	[COL76] [varchar](320) NULL,
	[COL77] [varchar](320) NULL,
	[COL78] [varchar](320) NULL,
	[COL79] [varchar](320) NULL,
	[COL80] [varchar](320) NULL,
	[COL81] [varchar](320) NULL,
	[COL82] [varchar](320) NULL,
	[COL83] [varchar](320) NULL,
	[COL84] [varchar](320) NULL,
	[COL85] [varchar](320) NULL,
	[COL86] [varchar](320) NULL,
	[COL87] [varchar](320) NULL,
	[COL88] [varchar](320) NULL,
	[COL89] [varchar](320) NULL,
	[COL90] [varchar](320) NULL,
	[COL91] [varchar](320) NULL,
	[COL92] [varchar](320) NULL,
	[COL93] [varchar](320) NULL,
	[COL94] [varchar](320) NULL,
	[COL95] [varchar](320) NULL,
	[COL96] [varchar](320) NULL,
	[COL97] [varchar](320) NULL,
	[COL98] [varchar](320) NULL,
	[COL99] [varchar](320) NULL,
	[COL100] [varchar](320) NULL
) 

 

My lesson learned was that running 5 process with a batch size process of 5000 the process for 100,000 rows will be around of 3 GB of data took around 50 seconds in a Heap Table.

 

Enjoy!

Version history
Last update:
‎Jun 21 2023 04:04 AM
Updated by: