Blog Post

Azure SQL Blog
4 MIN READ

How to migrate data from regular tables to ledger tables

PieterVanhove's avatar
PieterVanhove
Icon for Microsoft rankMicrosoft
Apr 20, 2023

This article will guide you through the steps to convert your existing data into tamper proof data by using the ledger feature.

 

To enable ledger for a specific table, you can migrate the data from an existing regular table to a ledger table, and then replace the original table with the ledger table. We have built the tools to help you do that. To convert a regular table into a ledger table, Microsoft recommends using the sys.sp_copy_data_in_batches stored procedure. It copies data from the source table to the target ledger table (updatable or append-only) after verifying that their schema is identical in terms of number of columns, column names and their data types. Indexes between the tables can be different but the target table can only be a heap or have a clustered index. To create a ledger table with an identical schema as the source table, you can script the existing table and add the ledger syntax to create an updatable ledger table or an append-only ledger table.

 

The stored procedure will split the copy operation into batches of 10-100K rows per transaction. Now, why are these batches important? When you perform a database ledger verification, the process needs to order all operations within each transaction. By using smaller batches, the database ledger verification has smaller transactions that can be sorted in parallel. This helps improve the time of the database ledger verification tremendously. If you use a SELECT INTO or BULK INSERT statement to copy a few billion rows from a regular table to a ledger table, it will all be done in one single transaction. This means lots of data must be fully sorted, which is performed in a single thread. The sorting operation takes a long time to complete and has impact on the verification time. Secondly, the copy process of the stored procedure is also more efficient than SELECT INTO or BULK INSERT. The operation uses parallelism and the logging optimizations, used traditionally by index build operations.

 

I did a simple test to demonstrate the benefits of the stored procedure (and the smaller batches). In the first test I will compare the performance of the data migration. The second test is the verification based on the 2 different migration methods.

 

Setup and scenario

I restored the Microsoft sample database WideWorldImportersDW on a local SQL Server 2022 instance and generated extra data for performance testing. The source table [Fact].[Sale] contained approximately 24 million records.

Secondly, I created 2 append-only ledger heap tables with the exact schema as the source table:

  • [Fact].[Sale_Ledger] will be used to load the data with the sys.sp_copy_data_in_batches stored procedure.
  • [Fact].[Sale_Ledger_2] will be used to load the data in 1 transaction by using INSERT SELECT.

Once the data is loaded, I ran the database verification on both tables separately to show the difference in verification time.

 

Migrate the data

First, I’m loading the data with the sp_copy_data_in_batches stored procedure. This took 3 min 42 sec to complete.

 

EXECUTE sys.sp_copy_data_in_batches @source_table_name = N'[Fact].[Sale]' , _table_name = N'[Fact].[Sale_Ledger]'

 

 

 

Secondly, I’m loading the data with a INSERT INTO... SELECT statement. This took 7 min 12 sec to complete.

 

INSERT INTO [Fact].[Sale_Ledger_2]([City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key])
SELECT [City Key], [Customer Key], [Bill To Customer Key], [Stock Item Key], [Invoice Date Key], [Delivery Date Key], [Salesperson Key], [WWI Invoice ID], [Description], [Package], [Quantity], [Unit Price], [Tax Rate], [Total Excluding Tax], [Tax Amount], [Profit], [Total Including Tax], [Total Dry Items], [Total Chiller Items], [Lineage Key] 
FROM [Fact].[Sale]

 

 

 

Ledger verification

First verification of the [Fact].[Sale_Ledger] ledger table which was loaded in batches had a total execution time of 1 min 31 sec.

 

DECLARE @digest_locations NVARCHAR(MAX) = (SELECT * FROM sys.database_ledger_digest_locations FOR JSON AUTO, INCLUDE_NULL_VALUES);SELECT @digest_locations as digest_locations;
BEGIN TRY
    EXEC sys.sp_verify_database_ledger_from_digest_storage @digest_locations, @table_name = N'[Fact].[Sale_Ledger]';
    SELECT 'Ledger verification succeeded.' AS Result;
END TRY
BEGIN CATCH
    THROW;
END CATCH

 

 

 

The second verification of the [Fact].[Sale_Ledger_2] ledger table, which was loaded in 1 transaction, had a total execution time of 2 min 51 sec.

 

DECLARE @digest_locations NVARCHAR(MAX) = (SELECT * FROM sys.database_ledger_digest_locations FOR JSON AUTO, INCLUDE_NULL_VALUES);SELECT @digest_locations as digest_locations;
BEGIN TRY
    EXEC sys.sp_verify_database_ledger_from_digest_storage @digest_locations, @table_name = N'[Fact].[Sale_Ledger_2]';
    SELECT 'Ledger verification succeeded.' AS Result;
END TRY
BEGIN CATCH
    THROW;
END CATCH

 

 

 

Conclusion

As you can see from the test results, the data migration and the verification on the table where I used the sp_copy_data_in_batches stored procedure is almost twice as fast as verifying a table that was loaded with a single transaction.
You can still use other commands, services, or tools to copy the data from the source table to the target table, but the bottom line is: make sure you avoid large transactions because this will have a performance impact on the database ledger verification and the data migration.

Updated Apr 20, 2023
Version 1.0
No CommentsBe the first to comment