Blog Post

Azure Data Factory Blog
1 MIN READ

Process your data in seconds with new ADF real-time CDC

Mark Kromer's avatar
Mark Kromer
Icon for Microsoft rankMicrosoft
Mar 03, 2023

In January, we announced that we've elevated our Change Data Capture features front-and-center in ADF. In ADF, CDC processes are light-weight always-running (not batch) data processing with a latency option. And up until today, the lowest latency we were allowing for CDC processing was 15 minutes. But today, I am super-excited to announce that we have enabled the real-time option!

 

Now you can process your change data in seconds. Follow these instructions for setting up a CDC process in ADF and set the Latency to "real-time". That's it! You won't need to build a pipeline or set a trigger. Your CDC process will continuously look for changes on your sources until you stop it. In the monitoring of your CDC processes, you will see checkpoints occur every few seconds as ADF continues to monitor your sources for changes.

 

To make building your CDC processes even faster and simpler, we've also introduced auto-mapping to CDC. Now when you build change data processes, ADF will automatically map your sources to your targets without the need for column mapping. You can always move the toggler slider to turn off auto-mapping and map your columns semantically, including fuzzy lookup logic that is built into ADF. Auto mapping, as opposed to column mapping, provides support for schema drift so that ADF can account for column changes between individual polling intervals.

 

 

 

 

Updated Mar 07, 2023
Version 3.0

7 Comments

  • ernosoinila's avatar
    ernosoinila
    Copper Contributor

    The CDC preview top level functionality is not really compatible with CI/CD workflow, where you have multiple datafactories. Say one for development (connected to git) and another for production (that is updated from say Azure Devops based on the git-repo ARM-scripts). 

    If you want to move the CDC Preview from dev to prod, you may lose data while the process is down during the deploy to production datafactory.  Is there a plan to make the CDC (preview) top level feature work with CD/CD processes with multiple datafactories? Maybe a way to transfer the cdc-watermark between datafactories etc. or something like that. Not having the preview in the name would also be nice.

     

    Is it possible to real-time headless Azure SQL (CDC-using) sourced data update from Pipelines(dataflow in them) instead of using the CDC (preview) top level feature?

  • neelkarve  We are working on plans to bring this to Fabric later in 2023/2024. We do not have a way today in CDC to reprocess specific events or times. If you wish to do this in ADF, the best way is to use Tumbling Window triggers and use the rerun or backfill feature.

  • neelkarve's avatar
    neelkarve
    Copper Contributor

    will this be available on MS Fabric ?

    Also how do you do reprocessing of dependent datasets incrementally after cdc event?

  • yoda-108's avatar
    yoda-108
    Copper Contributor

    Mark Kromer The runs and frequency slows down gradually to complete stop with real-time latency. Any ideas on debugging the issue.

  • AravinthK's avatar
    AravinthK
    Copper Contributor

    Mark Kromer 

    1. Any setup required in the data source to enable CDC? e.g., Azure SQL, SQL Server 

    2. Does the ADF CDC process have performance impact on source systems such as Azure SQL or Cosmos DB?