EDIT (28-June-2019): Note that this article applies to a version of SAP that was current at the time. Modern versions of the SAP migration tool SMIGR_CREATE_DDL no longer benefit from Trace Flag 715.
SAP provides R3load process to perform export/import of SAP data to SQL Server for homogenous (SQL Server to SQL Server) or heterogeneous migration from different OS or database platform (UNIX/Oracle). By default, R3Load process performs parallel bulk load operations while importing the data in SQL Server. With SQL Server 2014 SP1 CU8 , SQL Server 2014 SP2 CU1 and upcoming servicing release of SQL 2016 RTM, a new trace flag 715 is introduced which significantly improves the overall data migration throughput of SAP system data to SQL Server thereby reducing the data migration time by 60% for heaps with no non-clustered index .
Following graph shows the comparison of data migration throughput and elapsed time improvement with TF 715 in SQL Server 2014 SP1 CU8 for the same amount of data loaded by R3Load process.
TF 715 can be set at the instance-level (global) or session level which enables TABLOCK for bulk load operations into heap with no non-clustered indexes . When this trace flag is enabled, bulk load operations acquires bulk update (BU) locks by default when bulk copying data into a heap with no non-clustered index. Bulk update (BU) locks allow multiple threads to bulk load data concurrently into the same table while preventing other processes that are not bulk loading data from accessing the table. The behavior is similar to when the user explicitly specifies TABLOCK hint while performing bulk load or when the sp_tableoption ‘table lock on bulk load’ is on for a given table however enabling this TF makes this behavior default without making any query changes or database changes .
Parallel Bulk Load Workload
TF 715 significantly enhances the performance of parallel bulk load operation on a heap with no non-clustered index. The following chart compares the data load performance improvement of 4 parallel bulk load session inserting 10 millions records each on a sample heap table with no indexes. The total elapsed time with TF 715 is around 1/10 the time taken by bulk insert without TABLOCK.
The performance gain with TF 715 is achieved due to following
Reduced PFS contention and Allocation overhead – During bulk load operation on a table, a PFS page is touched roughly 3 times for every new page allocated for the table. So while inserting 1GB of data (131072 pages), the worker thread performing bulk load operation tries to touch or acquire latch on PFS pages, 393,216 times (131072 *3 ).With 4 parallel bulk load threads, the PFS pages are touched 1,572,864 times (131072*3 *4 ). As the parallel threads performing bulk load operations increases, PFS page contention increases on the database file reducing the overall throughput of the data load performance. When table lock (TABLOCK) is acquired while performing bulk load operations on heap with no non-clustered index , the PFS page is touched only once per extent (once per 8 pages) which reduces the allocation overhead by 1/24 which significantly improves the throughput and concurrency of parallel bulk insert operations reducing the data load time.
Bulk Update (BU) lock – When table lock (TABLOCK) is acquired by bulk load operations, BU lock is acquired at the table level by the worker threads performing bulk load operations compared to page and row level locks acquired without TABLOCK hint. Bulk update (BU) locks allow multiple threads to bulk load data concurrently into the same table while preventing other processes that are not bulk loading data from accessing the table.The BU lock at table level reduces the locking overhead allowing parallelism in bulk load operation which help to improve the overall throughput reducing the total data migration time.
Minimally Logging – if the database is in Simple or Bulk logged recovery mode , bulk load with TABLOCK into heap results into minimally logged operation which significantly reduces the logging overhead and I/O involved which further enhances the data load performance.
TF 715 is documented trace flag and will be supported by Microsoft. Overall turning on Trace flag 715 will be beneficial to improve the data load performance while loading the data in a heap with no index. The benefits are similar to adding a TABLOCK hint while performing bulk insert operation but in scenarios where some of the third party tools doesn’t allow customization to add the TABLOCK hint, trace flag 715 can be useful as it turns on the behavior out of the box without any application changes.
When NOT to enable trace flag 715?
If you have scenario where a DML or Select query is executed on heap table concurrently with bulk load operation, TF 715 can block sessions performing DML or select query on the table due to the table level lock. This is the scenario where Trace flag 715 can limit the overall concurrency of the table and should NOT be turned ON . Hence it is recommended to use this trace flag to optimize data load scenarios and should be turned off otherwise.