Forum Discussion
bedoonraj
May 21, 2025Copper Contributor
deduplication on SAP CDC connector
I have a pipeline in Azure Data Factory (ADF) that uses the SAP CDC connector to extract data from an SAP S/4HANA standard extractor. The pipeline writes data to an Azure staging layer (ADLS), and from there, it moves data to the bronze layer.
All rows are copied from SAP to the staging layer without any data loss. However, during the transition from staging to bronze, we observe that some rows are being dropped due to the deduplication process based on the configured primary key.
I have the following questions:
- How does ADF prioritize which row to keep and which to drop during the deduplication process?
- I noticed a couple of ADF-generated columns in the staging data, such as _SEQUENCENUMBER. What is the purpose of these columns, and what logic does ADF use to create or assign values to them?
Any insights would be appreciated.
No RepliesBe the first to reply