Blog Post

FastTrack for Azure
2 MIN READ

How To Create CDC In Azure Synapse Analytics Using DataFlows

Sally_Dabbah's avatar
Sally_Dabbah
Icon for Microsoft rankMicrosoft
Feb 09, 2023

Applies To: SQL Server, Azure SQL Database,  Azure SQL Managed Instance.

 

Discover the Power of Real-Time Data Tracking with CDC in Azure Synapse Analytics 

Unlock the potential of your data with Change Data Capture (CDC). 
This process captures all insert, update, and delete operations performed on your database, making the changes available in near real-time. Keep track of every modification made to your data, allowing your applications to consume and integrate these updates seamlessly. 

In this guide, we'll show you how to implement CDC in Azure Synapse Analytics data flows. Follow these simple steps to get started. 
 
 

Prerequisites: 

Step 1: Build the dataflows 

To create a dataflow in Azure Synapse Analytics, follow these steps: 

  1. Navigate to the Develop tab. 
  2. Click on "Add a Dataflow". 
  3. Add a source, such as the Customer table from the Adventureworks database (Make sure to enable CDC see picture attached below).  
  4. Save the data to a Data Lake by sinking it as Parquet files. 

 

 

  

 

 

  

Enable Change Data Capture in Source: 

 

in SalesLT.Product table, we have 295 rows, Let's try to run few changes to the DB to see how it will affect number of rows.  
Unfortunately, the change data capture feature won't be displayed in the Data Preview tab of the dataflow. To check the number of rows, I created a basic dataflow with two main activities. 
 

 

   

source is the output of dataflow mentioned in step 1 and sink is a cached sink.  


Step 2: Validation of CDC concept 
now we will try 2 debug runs: 

First debug run: 
run the first dataflow where we transform data into parquets.  
we can see that all rows in the first run are written successfully.  

 

if we run it again - we expect to see 0 rows written since we didn't change the data: 

 

Now I'm going to update a row using SQL query in SSMS like so:   

UPDATE [SalesLT].[Product] 
SET [Name] = 'sallydabbah' 
WHERE [ProductID]= 680 
GO 

so, we changed one row, in SSMS we can see this row has been changed when we query the table in SSMS: 

 

if we debug run the dataflow again, we should see one row written: 

 

Returning to our second dataflow, we aim to confirm that the number of rows has increased by one, resulting in 296 rows. To verify this, we can click on the "Data Preview" tab in the source of the second dataflow:  
 

 

Links: 

Enable and Disable change data capture - SQL Server | Microsoft Learn 

AdventureWorks sample databases - SQL Server | Microsoft Learn 

Quickstart: Get started - create a Synapse workspace - Azure Synapse Analytics | Microsoft Learn  

 

 

Updated Feb 09, 2023
Version 1.0
No CommentsBe the first to comment