Applies To: SQL Server, Azure SQL Database, Azure SQL Managed Instance.
Discover the Power of Real-Time Data Tracking with CDC in Azure Synapse Analytics
Unlock the potential of your data with Change Data Capture (CDC).
This process captures all insert, update, and delete operations performed on your database, making the changes available in near real-time. Keep track of every modification made to your data, allowing your applications to consume and integrate these updates seamlessly.
In this guide, we'll show you how to implement CDC in Azure Synapse Analytics data flows. Follow these simple steps to get started.
Prerequisites:
- AdventureWorks sample database from MS (see link below)
- Azure Synapse Workspace (see link below on how to create one)
- SQL Server Management Studio (SSMS)
- Azure Data Lake Gen 2 storage account
- Basic knowledge of CDC concept. What is change data capture (CDC)? - SQL Server | Microsoft Learn
Step 1: Build the dataflows
To create a dataflow in Azure Synapse Analytics, follow these steps:
- Navigate to the Develop tab.
- Click on "Add a Dataflow".
- Add a source, such as the Customer table from the Adventureworks database (Make sure to enable CDC see picture attached below).
- Save the data to a Data Lake by sinking it as Parquet files.
Enable Change Data Capture in Source:
in SalesLT.Product table, we have 295 rows, Let's try to run few changes to the DB to see how it will affect number of rows.
Unfortunately, the change data capture feature won't be displayed in the Data Preview tab of the dataflow. To check the number of rows, I created a basic dataflow with two main activities.
source is the output of dataflow mentioned in step 1 and sink is a cached sink.
Step 2: Validation of CDC concept
now we will try 2 debug runs:
First debug run:
run the first dataflow where we transform data into parquets.
we can see that all rows in the first run are written successfully.
if we run it again - we expect to see 0 rows written since we didn't change the data:
Now I'm going to update a row using SQL query in SSMS like so:
UPDATE [SalesLT].[Product]
SET [Name] = 'sallydabbah'
WHERE [ProductID]= 680
GO
so, we changed one row, in SSMS we can see this row has been changed when we query the table in SSMS:
if we debug run the dataflow again, we should see one row written:
Returning to our second dataflow, we aim to confirm that the number of rows has increased by one, resulting in 296 rows. To verify this, we can click on the "Data Preview" tab in the source of the second dataflow:
Links:
Enable and Disable change data capture - SQL Server | Microsoft Learn
AdventureWorks sample databases - SQL Server | Microsoft Learn
Quickstart: Get started - create a Synapse workspace - Azure Synapse Analytics | Microsoft Learn
Updated Feb 09, 2023
Version 1.0Sally_Dabbah
Microsoft
Joined July 10, 2022
FastTrack for Azure
Follow this blog board to get notified when there's new activity