Amazon Web Services (AWS) has announced the discontinuation of its Amazon Quantum Ledger Database (QLDB). In my previous blog post Moving from Amazon Quantum Ledger Database (QLDB) to ledger in Azure SQL we discussed that Microsoft offers an excellent alternative for users to host their data with cryptographic immutability. This ensures they can uphold rigorous data integrity standards. Ledger in Azure SQL serves as a strong alternative, with effortless integration into the Azure SQL environment.
This post outlines a method for migrating an Amazon QLDB ledger to Azure SQL Database, utilizing the US Department of Motor Vehicles (DMV) sample ledger from the tutorial in the Amazon QLDB Developer Guide as a reference. You can adapt this solution for your own schema and migration strategy.
Navigating Database Migrations: Key Considerations for a Smooth Transition
When embarking on a database migration journey, one of the first decisions you’ll face is determining the scope of the data to be migrated. Depending on your application’s needs, you might opt to transfer the entire database, including all historical data, or you might choose to migrate only the most recent data, archiving the older records for future reference.
Another crucial aspect to consider is how you’ll model the data in your new Azure SQL database. This involves transforming the ledger data to align with your chosen model during the migration process. You have a couple of options here:
- Normalization: This approach involves converting the document model to a relational model. While this can be challenging, it’s often the best way to ensure your data is in a usable format for a relational database post-migration.
- JSON Storage: Alternatively, you could migrate the complete JSON into a single column of the ledger table. This simplifies the migration process but may not be the most efficient for data access in the relational database.
Each option has its own set of trade-offs, and the right choice will depend on your specific use case and requirements. In this blog post, we will use the normalization approach.
Solution
The solution is build in Azure Data Factory and is partially based on the blog post Dynamically Map JSON to SQL in Azure Data Factory | Under the kover of business intelligence (sqlkover.com) by MVP Koen Verbeeck.
Prerequisites
Preparing Source Files
To kick off the data migration process, we first need to prepare the source files. There are two primary approaches to consider:
- Transform and Import CSV Files: Convert Amazon QLDB JSON documents into CSV files and import them into the ledger tables of your Azure SQL Database. This involves exporting and converting the Amazon QLDB ledger to CSV files, as detailed in the “Export Data” and “Extract and Transform” sections of this blog post. The CSV files can then be imported into Azure SQL Database using methods such as:
- Bulk Copy
- Copy Activity in Azure Data Factory: This activity migrates data from source files to Azure SQL Database.
- Save and Import JSON Files: Save the QLDB data as JSON files on Azure Storage and import them with Azure Data Factory (ADF) as relational data.
In this post, we’ll focus on the second approach and provide a detailed walkthrough. Follow these steps to create the source files:
- Step 1: Open the section “Step 2: Create tables, indexes, and sample data in a ledger” in the Amazon Quantum Ledger Database (Amazon QLDB) documentation.
- Step 2: Navigate to the “Manual Option” section and copy the JSON sample data for the Person, DriversLicense, VehicleRegistration, and Vehicle tables into separate JSON files. Make sure you use the correct JSON syntax like the example below. Name the files according to their respective tables.
- Step 3: Upload these JSON files to an Azure Storage Container.
[
    {
        "FirstName" : "Raul",
        "LastName" : "Lewis",
        "DOB" : "1963-08-19",
        "GovId" : "LEWISR261LL",
        "GovIdType" : "Driver License",
        "Address" : "1719 University Street, Seattle, WA, 98109"
    },
    {
        "FirstName" : "Brent",
        "LastName" : "Logan",
        "DOB" : "1967-07-03",
        "GovId" : "LOGANB486CG",
        "GovIdType" : "Driver License",
        "Address" : "43 Stockert Hollow Road, Everett, WA, 98203"
    },
    {
        "FirstName" : "Alexis",
        "LastName" : "Pena",
        "DOB" : "1974-02-10",
        "GovId" : "744 849 301",
        "GovIdType" : "SSN",
        "Address" : "4058 Melrose Street, Spokane Valley, WA, 99206"
    },
    {
        "FirstName" : "Melvin",
        "LastName" : "Parker",
        "DOB" : "1976-05-22",
        "GovId" : "P626-168-229-765",
        "GovIdType" : "Passport",
        "Address" : "4362 Ryder Avenue, Seattle, WA, 98101"
    },
    {
        "FirstName" : "Salvatore",
        "LastName" : "Spencer",
        "DOB" : "1997-11-15",
        "GovId" : "S152-780-97-415-0",
        "GovIdType" : "Passport",
        "Address" : "4450 Honeysuckle Lane, Seattle, WA, 98101"
    }
]
By following these steps, you’ll have four JSON source files stored in an Azure Storage Account, ready to be used for data migration.
Preparing target database and tables
In this example, we’ll be using an Azure SQL Database as our target. It’s important to note that the ledger feature is also available in Azure SQL Managed Instance and SQL Server 2022. For each JSON file created in the previous section, we’ll set up an updatable ledger table.
Follow these steps to create the database and the updatable ledger tables:
- Create a Single Database: Begin by creating a single database in Azure SQL Database. You can find detailed instructions in the Azure SQL Database documentation.
- Create Updatable Ledger Tables: Next, run the script provided below to create the updatable ledger tables. Make sure to adjust the script according to your specific requirements.
CREATE TABLE dbo.Person (
    FirstName NVARCHAR(50),
    LastName NVARCHAR(50),
    DOB DATE,
    GovId NVARCHAR(50),
    GovIdType NVARCHAR(50),
    Address NVARCHAR(255)
)
WITH 
(
 SYSTEM_VERSIONING = ON (HISTORY_TABLE = [dbo].[PersonHistory]),
 LEDGER = ON
);
GO
CREATE TABLE dbo.DriversLicense (
    LicensePlateNumber NVARCHAR(50),
    LicenseType NVARCHAR(50),
    ValidFromDate DATE,
    ValidToDate DATE,
    PersonId NVARCHAR(50)
)
WITH 
(
 SYSTEM_VERSIONING = ON (HISTORY_TABLE = [dbo].[DriversLicenseHistory]),
 LEDGER = ON
);
GO
CREATE TABLE dbo.VehicleRegistration (
    VIN NVARCHAR(50),
    LicensePlateNumber NVARCHAR(50),
    State NVARCHAR(50),
    City NVARCHAR(50),
    PendingPenaltyTicketAmount DECIMAL(10, 2),
    ValidFromDate DATE,
    ValidToDate DATE,
    PrimaryOwner NVARCHAR(100),
	SecondaryOwner NVARCHAR(100)
)
WITH 
(
 SYSTEM_VERSIONING = ON (HISTORY_TABLE = [dbo].[VehicleRegistrationHistory]),
 LEDGER = ON
);
GO
CREATE TABLE dbo.Vehicle (
    VIN NVARCHAR(50),
    Type NVARCHAR(50),
    Year INT,
    Make NVARCHAR(50),
    Model NVARCHAR(50),
    Color NVARCHAR(50)
)
WITH 
(
 SYSTEM_VERSIONING = ON (HISTORY_TABLE = [dbo].[VehicleHistory]),
 LEDGER = ON
);
- Configure Database User: Finally, create a database user that corresponds to the Managed Identity of your Azure Data Factory. Add this new user to the db_datawriter role to ensure the Azure Data Factory pipeline has the necessary permissions to write to the database.
CREATE USER [ledgeradf] FROM EXTERNAL PROVIDER;
ALTER ROLE [db_datawriter] ADD MEMBER [ledgeradf]
GRANT EXECUTE TO [ledgeradf];
By following these steps, you’ll set up your Azure SQL Database with the appropriate ledger tables, ready to handle the data migration from your JSON files.
As a next step, we need to create two additional mapping tables. The first table will map SQL Server data types to the corresponding data types expected by Azure Data Factory. The second table will establish a mapping between table names and their collection references. Below are the scripts to create and populate these tables:
CREATE TABLE ADF_DataTypeMapping ( [ADFTypeMappingID] int, [ADFTypeDataType] varchar(20), [SQLServerDataType] varchar(20) )
INSERT INTO ADF_DataTypeMapping ([ADFTypeMappingID], [ADFTypeDataType], [SQLServerDataType])
VALUES
( 1, 'Int64', 'BIGINT' ), 
( 2, 'Byte array', 'BINARY' ), 
( 3, 'Boolean', 'BIT' ), 
( 4, 'String', 'CHAR' ), 
( 5, 'DateTime', 'DATE' ), 
( 6, 'DateTime', 'DATETIME' ), 
( 7, 'DateTime', 'DATETIME2' ), 
( 8, 'DateTimeOffset', 'DATETIMEOFFSET' ), 
( 9, 'Decimal', 'DECIMAL' ), 
( 10, 'Double', 'FLOAT' ), 
( 11, 'Byte array', 'IMAGE' ), 
( 12, 'Int32', 'INT' ), 
( 13, 'Decimal', 'MONEY' ), 
( 14, 'String', 'NCHAR' ), 
( 15, 'String', 'NTEXT' ), 
( 16, 'Decimal', 'NUMERIC' ), 
( 17, 'String', 'NVARCHAR' ), 
( 18, 'Single', 'REAL' ), 
( 19, 'Byte array', 'ROWVERSION' ), 
( 20, 'DateTime', 'SMALLDATETIME' ), 
( 21, 'Int16', 'SMALLINT' ), 
( 22, 'Decimal', 'SMALLMONEY' ), 
( 23, 'Byte array', 'SQL_VARIANT' ), 
( 24, 'String', 'TEXT' ), 
( 25, 'DateTime', 'TIME' ), 
( 26, 'String', 'TIMESTAMP' ), 
( 27, 'Int16', 'TINYINT' ), 
( 28, 'GUID', 'UNIQUEIDENTIFIER' ), 
( 29, 'Byte array', 'VARBINARY' ), 
( 30, 'String', 'VARCHAR' ), 
( 31, 'String', 'XML' ), 
( 32, 'String', 'JSON' );
GO
CREATE TABLE [dbo].[TableCollectionReference](
	[TableName] [nvarchar](255) NULL,
	[collectionreference] [nvarchar](255) NULL
) ON [PRIMARY]
INSERT INTO [dbo].[TableCollectionReference]
           ([TableName]
           ,[collectionreference])
     VALUES
           ('VehicleRegistration','Owners')
As the final step, we need to create a stored procedure for the pipeline to map the JSON file to the database table. This stored procedure, inspired by Koen Verbeeck’s function, will read the table's metadata and convert it into the required JSON structure. You can find the code for this procedure below.
CREATE PROCEDURE [dbo].[usp_Get_JSONTableMapping]
        @TableName VARCHAR(250)
       
    AS
    BEGIN
        SET NOCOUNT ON;
		DECLARE  @CollectionReference VARCHAR(250)
		SELECT @CollectionReference=collectionreference FROM TableCollectionReference where TableName = @TableName
        SELECT jsonmapping = '{"type": "TabularTranslator", "mappings": ' + 
        (
            SELECT
                 'source.path'  = '$[''' + c.[name] + ''']'
              --, 'source.type'  = m.ADFTypeDataType
                , 'sink.name'    = c.[name]
                , 'sink.type'    = m.ADFTypeDataType
            FROM sys.tables                 t
            JOIN sys.schemas                s ON s.schema_id        = t.schema_id
            JOIN sys.all_columns            c ON c.object_id        = t.object_id
            JOIN sys.types                  y ON c.system_type_id   = y.system_type_id
                                                AND c.user_type_id  = y.user_type_id
            JOIN dbo.ADF_DataTypeMapping    m ON y.[name]           = m.SQLServerDataType
            WHERE   1 = 1
                AND t.[name] = @TableName
                AND c.[name] not like 'ledger%'
            ORDER BY c.column_id
            FOR JSON PATH
        ) + ',"collectionreference": "' + ISNULL(@CollectionReference,'') + '","mapComplexValuesToString": true}'; 
    END
Building the Azure Data Factory Pipeline
Configuring Dynamic Data Sets in Azure Data Factory
For the source, we will dynamically fetch JSON files. This requires specifying only the file path of the container in your storage account where the JSON files are stored. This configuration allows Azure Data Factory to automatically process incoming files without the need for manual pipeline updates. See the screenshot below as an example.
The sink, which is your database, will also be configured dynamically. The table name in the database will correspond to the name of the JSON file fetched. To achieve this, we utilize two parameters: one for the table schema and another for the table name. See the screenshot below as an example.
Pipeline Overview
Our Azure Data Factory pipeline operates in the following steps:
- Fetching File Names: The pipeline begins by retrieving the names of all files stored in the designated storage account container.
- Extracting JSON Metadata: For each file identified, the pipeline fetches the JSON metadata. This metadata provides the necessary information about the structure of the JSON file, which is crucial for the subsequent data transfer.
- Copying Data to Database: With the JSON structure in hand, the pipeline then copies the data from each JSON file into the corresponding table in the database. The table names in the database are dynamically matched to the file names, ensuring that data is accurately and efficiently transferred.
Let’s have a closer look on each of these components.
Get Metadata Filenames
When working with Azure Data Factor, the Get Metadata activity is a powerful tool for managing data stored in blob storage or data lake folders. This activity can retrieve a child Items array that lists all files and folders within a specified directory. We utilize the Get Metadata activity to fetch the names of all files stored in a specific storage account container. Define the storage account as the Dataset and the Child Items as the Field list.
For Each File
The child items array that was retrieved is used to go over each file. The following expression is being used for this.
Fetch the JSON metadata
In this activity, we’re going to map the JSON to the table. I’ll use a Lookup activity to execute the stored procedure usp_Get_JSONTableMapping, which we created earlier. The dataset will be the database itself, and the stored procedure’s parameter will be the table name, derived from the file name without the JSON extension.
Copy data to ledger tables
The final step involves copying data from the JSON files to the ledger tables. The source for the Copy Data activity is the Azure Storage Account path, using the file name we retrieved earlier.
The destination (sink) is the database and the corresponding table name, which is dynamically derived from the file name without the JSON extension. Set the write behavior to “Insert” and ensure the Table option is set to “Use existing.”
The mapping is based on the output from the stored procedure executed in the previous step. Activate the Advance editor and add dynamic content like the example below.
Conclusion
Migrating from Amazon QLDB to Azure SQL Database ledger tables offers a robust solution for maintaining data integrity and leveraging the advanced features of Azure's ecosystem. This guide has outlined the key considerations and steps involved in the migration process, including data modelling, preparation of source files, and the configuration of the target database. By following the detailed instructions provided, organizations can ensure a smooth transition and take full advantage of Azure SQL Database's capabilities.
Useful links
For more information and to get started with ledger in Azure SQL, see:
- Explore the Azure SQL Database ledger documentation
- Read the whitepaper
- GitHub demo/sample
- Data Exposed episode (video)
- Listen to the ledger podcast