How To Create CDC In Azure Synapse Analytics Using DataFlows
Published Feb 08 2023 11:26 PM 4,416 Views
Microsoft

Applies To: Sally_Dabbah_0-1675927163299.pngSQL Server, Sally_Dabbah_1-1675927163300.pngAzure SQL Database, Sally_Dabbah_2-1675927163300.png Azure SQL Managed Instance.

 

Discover the Power of Real-Time Data Tracking with CDC in Azure Synapse Analytics 

Unlock the potential of your data with Change Data Capture (CDC). 
This process captures all insert, update, and delete operations performed on your database, making the changes available in near real-time. Keep track of every modification made to your data, allowing your applications to consume and integrate these updates seamlessly. 

In this guide, we'll show you how to implement CDC in Azure Synapse Analytics data flows. Follow these simple steps to get started. 
 
 

Prerequisites: 

Step 1: Build the dataflows 

To create a dataflow in Azure Synapse Analytics, follow these steps: 

  1. Navigate to the Develop tab. 
  2. Click on "Add a Dataflow". 
  3. Add a source, such as the Customer table from the Adventureworks database (Make sure to enable CDC see picture attached below).  
  4. Save the data to a Data Lake by sinking it as Parquet files. 

Sally_Dabbah_3-1675927163301.png

 

 

  

Sally_Dabbah_4-1675927163302.png

 

 

  

Enable Change Data Capture in Source: 

Sally_Dabbah_5-1675927163302.png

 

in SalesLT.Product table, we have 295 rows, Let's try to run few changes to the DB to see how it will affect number of rows.  
Unfortunately, the change data capture feature won't be displayed in the Data Preview tab of the dataflow. To check the number of rows, I created a basic dataflow with two main activities. 
 

Sally_Dabbah_6-1675927163303.png

 

   

source is the output of dataflow mentioned in step 1 and sink is a cached sink.  


Step 2: Validation of CDC concept 
now we will try 2 debug runs: 

First debug run: 
run the first dataflow where we transform data into parquets.  
we can see that all rows in the first run are written successfully.  

Sally_Dabbah_7-1675927163304.png

 

if we run it again - we expect to see 0 rows written since we didn't change the data: 

Sally_Dabbah_8-1675927163305.png

 

Now I'm going to update a row using SQL query in SSMS like so:   

UPDATE [SalesLT].[Product] 
SET [Name] = 'sallydabbah' 
WHERE [ProductID]= 680 
GO 

so, we changed one row, in SSMS we can see this row has been changed when we query the table in SSMS: 

Sally_Dabbah_9-1675927163306.png

 

if we debug run the dataflow again, we should see one row written: 

Sally_Dabbah_10-1675927163306.png

 

Returning to our second dataflow, we aim to confirm that the number of rows has increased by one, resulting in 296 rows. To verify this, we can click on the "Data Preview" tab in the source of the second dataflow:  
 

Sally_Dabbah_11-1675927163307.png

 

Links: 

Enable and Disable change data capture - SQL Server | Microsoft Learn 

AdventureWorks sample databases - SQL Server | Microsoft Learn 

Quickstart: Get started - create a Synapse workspace - Azure Synapse Analytics | Microsoft Learn  

 

 

Co-Authors
Version history
Last update:
‎Feb 08 2023 11:24 PM
Updated by: