Using SSIS to load 1TB data into SQL Server in 30 mins, with simplified settings

Copper Contributor

Mar 25, 2019

First published on MSDN on Oct 13, 2016
In 2008, SSIS team posted a blog about https://technet.microsoft.com/en-us/library/dd537533(v=sql.100).aspx and after 8 years, hardware and software are rapidly improved, now we are doing similar experiment with only on two servers which can achieve same performance. In our experiment, we use the latest SSIS 2016 and we can load 1TB data in 30 minutes (1.5TB dataset in 43 minutes). In addition, we also tested data loading into table with column store index and provide the details in end this article

Design of this experiment is almost same with what we did in 2008 as shown in figure 1. We leverage DB partitions and run multiple SSIS instances to ingest data in parallel way

Figure 1

Each package will write to a different partition in the destination tables. More precisely, as illustrated in Figure 2, each package will write into a separate table for highest performance, and the tables will be “switched in” to partitions of the larger table. This will be described more fully in the section on database setup. There are a number of times when partitioning a table is a good practice, one of them being when multiple large insertions need to be performed concurrently

Figure 2

Since we only use two servers and the server configuration is different now, so we simplified settings for NUMA, network and disk configuration.

Test Environment

Two servers, these two servers have same hardware configuration. One of them is dedicated for SQL SERVER and another one is dedicated for SSIS instances. We use 10 GBE link between these two servers. Each server has 4 SSD drives, in SQL SERVER machine, SSD drives are used to store SQL SERVER file groups, and in SSIS instance machine, SSD drives are used to store flat source files.

Server physical configuration:

CPU: 2 sockets each with 12 cores Intel Xeon 2.60GHz

Two physical NUMA nodes

Memory: 128GB

OS: Windows Server 2012 R2 64-bit

Disk: Random read Speed 324 MB/s for each drive; Random write Speed 347 MB/s for each drive

Network: 10 GBE link

Database Setup

First we need to create main table and partitions within main table (as you see in figure 3). Main table is a heap table. Heap table is a table without a clustered index, data is stored without specifying an order, it requires less operation in SQL server side, so it’s can perform better performance for data loading purpose, please refer to https://msdn.microsoft.com/en-us/library/hh213609.aspx for details about heap table

Figure 3

Create Database:

The database is created on 48 file groups and these file groups are distributed to 4 SSDs disk in round-robin way; each file group have 50 GB size. The reason why we split 4 disks is trying to avoid disk throughput bottleneck

CREATE DATABASE sample ON

PRIMARY

( NAME = NYXTaxF0,

FILENAME = N'C:\SQL\NYXTaxFG0.mdf' ,

SIZE = 1GB, MAXSIZE = 1GB , FILEGROWTH = 10% ),

FILEGROUP FG1

( NAME = NYXTaxF1,

FILENAME = 'E:\SQL\NYXTaxFG1.mdf' ,