Forum Widgets
Latest Discussions
Advice requested: how to capture full SQL CDC changes using Dataflow and ADLS gen2
Hi, I'm working on a fairly simple ETL process using Dataflow in Azure Data Factory, where I want to capture the changes in a CDC-enabled SQL table, and store those in Delta Lake format in a ADLS gen2 sink. The resulting dataset will be further processed, but for me this is the end of the line. I don't have an expert understanding of all the details of the Delta Lake format, but I do know that I can use it to store changes to my data over time. So in the sink, I enabled all Update methods (Insert, Delete, Upsert, Update), since my CDC source should be able to figure out the correct row transformation. Key columns are set to the primary key columns in SQL. All this works fine as long as I configure my source to use CDC with 'netChanges: true'. That yields a single change row for each record, which is correctly stored in the sink. But I want to capture all changes since the previous run, so I want to set the source to netChanges: false. That yields rows for every change since the previous time the dataflow ran. But for every table that actually has records with more than one change, the dataflow fails saying "Cannot perform Merge as multiple source rows matched and attempted to modify the same target row in the Delta table in possibly conflicting ways." I take that to mean that my dataflow is, as it is, not smart enough to loop through all changes in the source, and apply them to the sink in order. So apparently something else has to be done. My intuition says that, since CDC actually provides all the metadata to make this possible, there's probably an out-of-the-box way to achieve what I want. But I can't readily find that magic box I should tick 😉. I can probably build it out 'by hand', by somehow looping over all changes and applying them in order, but before I go down that route, I came here to learn from the experts whether this is indeed the only way, or, preferably, that there is a neat trick I missed to get this done easily. Thanks so much for your advice! BRAnnejanBareldsJul 03, 2025Copper Contributor19Views0likes0CommentsSolution: Handling Concurrency in Azure Data Factory with Marker Files and Web Activities
Hi everyone, I wanted to share a concurrency issue we encountered in Azure Data Factory (ADF) and how we resolved it using a small but effective enhancement—one that might be useful if you're working with shared Blob Storage across multiple environments (like Dev, Test, and Prod). Background: Shared Blob Storage & Marker Files In our ADF pipelines, we extract data from various sources (e.g., SharePoint, Oracle) and store them in Azure Blob Storage. That Blob container is shared across multiple environments. To prevent duplicate extractions, we use marker files: started.marker → created when a copy begins completed.marker → created when the copy finishes successfully If both markers exist, pipelines reuse the existing file (caching logic). This mechanism was already in place and worked well under normal conditions. The Issue: Race Conditions We observed that simultaneous executions from multiple environments sometimes led to: Overlapping attempts to create the same started.marker Duplicate copy activities Corrupted Blob files This became a serious concern because the Blob file was later loaded into Azure SQL Server, and any corruption led to failed loads. The Fix: Web Activity + REST API To solve this, we modified only the creation of started.marker by: Replacing Copy Activity with a Web Activity that calls the Azure Storage REST API The API uses Azure Blob Storage's conditional header If-None-Match: * to safely create the file only if it doesn't exist If the file already exists, the API returns "BlobAlreadyExists", which the pipeline handles by skipping. The Copy Activity is still used to copy the data and create the completed.marker—no changes needed there. Updated Flow Check marker files: If both exist (started and completed) → use cached file If only started.marker → wait and retry If none → continue to step 2 Web Activity calls REST API to create started.marker Success → proceed with copy in step 3 Failure → another run already started → skip/retry Copy Activity performs the data extract Copy Activity creates completed.marker Benefits Atomic creation of started.marker → no race conditions Minimal change to existing pipeline logic with marker files Reliable downstream loads into Azure SQL Server Preserves existing architecture (no full redesign) Would love to hear: Have you used similar marker-based patterns in ADF? Any other approaches to concurrency control that worked for your team? Thanks for reading! Hope this helps someone facing similar issues.mkoralageJun 29, 2025Copper Contributor14Views0likes0CommentsBlob Storage Event Trigger Disappears
Yesterday I ran into an odd situation where there was a resource lock and I was unable to rename pipelines or drop/create storage event triggers. An admin cleared the lock and I was able to remove and clean up the triggers and pipelines. Today, when I try to recreate the blob storage trigger to process a file when it appears in a container, the trigger creates just fine but on refresh, it disappears. If I try to recreate it again with the same name as the one that went away ADF UI says it already exists. I cannot assign it to a pipeline because the UI does not see it. Any insight as to where it is, how I can see it, or even what logs would have such activity recorded to give a clue as to what is going on. This seems like a bug.RobDuMoJun 24, 2025Copper Contributor12Views0likes0CommentsBest practice to integrate to Azure DevOps?
Different sources suggesting different recommendations regarding ADF and ADO integration. Some say to use 'adf_publish' branch, while some suggest to use 'main' branch to be source for triggering yaml pipelines and disabling 'Publish' function in ADF. I guess practices are changing and setup could be different. The problem is finding all this information on the Internet makes it so confusing. So, the question is what is the best practice now (taking into account all the latest changes in ADO) regarding branches? How you set up your ADF and ADO integrations?alwaysLearnerJun 08, 2025Iron Contributor70Views0likes2CommentsParameterization of Linked Services
I am trying to parameterize Linked Service in ADF. Probably got confused, and hope someone will make it clear. Two questions: I have two parameters: 'url' and 'secretName'. However, in ARM template I only see 'url' parameter, but not 'secretName'. Why 'secretName' is not parameterized? How do I supply a value for the 'url' parameter when I will deploy ARM template to another environment (let's say 'Test' environment)? These are files: Linked Service: { "name": "LS_DynamicParam", "properties": { "parameters": { "SA_URL": { "type": "String", "defaultValue": "https://saforrisma.dfs.core.windows.net/" }, "SecretName": { "type": "String", "defaultValue": "MySecretInKeyVault" } }, "annotations": [], "type": "AzureBlobFS", "typeProperties": { "url": "@{linkedService().SA_URL}", "accountKey": { "type": "AzureKeyVaultSecret", "store": { "referenceName": "LS_AKV", "type": "LinkedServiceReference" }, "secretName": { "value": "@linkedService().SecretName", "type": "Expression" } } } } } ARMTemplatePArametersForFactory.json { "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#", "contentVersion": "1.0.0.0", "parameters": { "factoryName": { "value": "ADF-Dev" }, "LS_AKV_properties_typeProperties_baseUrl": { "value": "https://kv-forrisma.vault.azure.net/" }, "LS_MAINStorage_properties_typeProperties_connectionString_secretName": { "value": "storageaccount-adf-dev" }, "LS_DynamicParam_properties_typeProperties_url": { "value": "@{linkedService().SA_URL}" } } }alwaysLearnerJun 08, 2025Iron Contributor29Views0likes0CommentsAnother Oracle 2.0 issue
It seemed like Oracle LS 2.0 was finally working in production. However, some pipelines have started to fail in both production and development environments with the following error message: ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.ArrayIndexOutOfBoundsException:255 total entry:1 com.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.addDecimalColumn(ParquetWriterBuilderBridge.java:107) .,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,' When I revert the Linked Service version back to 1.0, the copy activity runs successfully. Has anyone encountered this issue before or found a workaround?ernestdJun 06, 2025Copper Contributor71Views0likes0CommentsAuto update of table in target(Snowflake) when source schema changes(SQL).
Hi, So this is my use case: I have source as SQL server and target as Snowflake. I have dataflow in place to load historic and cdc records from sql to snowflake.I am using inline cdc option available in dataflow for cdc which uses sql's cdc functionality. Now the problem is some tables in my source have schema changes very often say once a month and I want the target tables to alter based on schema change. Note : 1. I could only found dataflow for loading since we dont have watermark columns in sql tables. 2.Recreating the table in target on each load is not an good option since we have billions of recors altogether . Please help me with solution on this . ThanksDivyaniPardeshiJun 04, 2025Copper Contributor16Views0likes0CommentsOn-Prem SQL server db to Azure SQL db
Hi, I'm trying to copy data from on-prem sql server db to azure sql db using SHIR and ADF. I've stood up ADF, SQL server and a db in Azure. As per my knowledge we need to download SHIR download link and install SHIR in on-prem server and register that SHIR with ADF key. The On-prem SQL server has TCP/IP connection enabled. What other set up i need to do in on-prem server such as firewall, IP, port configurations? The on-prem sql server is in different network which is not connected to our network.ManuMay 28, 2025Copper Contributor35Views0likes1CommentHow to Flatten Nested Time-Series JSON from API into Azure SQL using ADF Mapping Data Flow?
How to Flatten Nested Time-Series JSON from API into Azure SQL using ADF Mapping Data Flow? Hi Community, I'm trying to extract and load data from API returning the following JSON format into an Azure SQL table using Azure Data Factory. { "2023-07-30": [], "2023-07-31": [], "2023-08-01": [ { "breakdown": "email", "contacts": 2, "customers": 2 } ], "2023-08-02": [], "2023-08-03": [ { "breakdown": "direct", "contacts": 5, "customers": 1 }, { "breakdown": "referral", "contacts": 3, "customers": 0 } ], "2023-08-04": [], "2023-09-01": [ { "breakdown": "direct", "contacts": 76, "customers": 40 } ], "2023-09-02": [], "2023-09-03": [] } Goal: I want to flatten this nested structure and load it into Azure SQL like this: Expand table ReportDate Breakdown Contacts Customers 2023-07-30 (no row) (no row) (no row) 2023-07-31 (no row) (no row) (no row) 2023-08-01 email 2 2 2023-08-02 (no row) (no row) (no row) 2023-08-03 direct 5 1 2023-08-03 referral 3 0 2023-08-04 (no row) (no row) (no row) 2023-09-01 direct 76 40 2023-09-02 (no row) (no row) (no row) 2023-09-03 (no row) (no row) (no row)SYNMay 24, 2025Copper Contributor23Views0likes1CommentOracle 2.0 property authenticationType is not specified
I just published upgrade to Oracle 2.0 connector (linked service) and all my pipelines ran OK in dev. This morning I woke up to lots of red pipelines that ran during the night. I get the following error message: ErrorCode=OracleConnectionOpenError,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message= Failed to open the Oracle database connection.,Source=Microsoft.DataTransfer.Connectors.OracleV2Core,''Type=System.ArgumentException, Message=The required property is not specified. Parameter name: authenticationType,Source=Microsoft.Azure.Data.Governance.Plugins.Core,' Here is the code for my Oracle linked service: { "name": "Oracle", "properties": { "parameters": { "host": { "type": "string" }, "port": { "type": "string", "defaultValue": "1521" }, "service_name": { "type": "string" }, "username": { "type": "string" }, "password_secret_name": { "type": "string" } }, "annotations": [], "type": "Oracle", "version": "2.0", "typeProperties": { "server": "@{linkedService().host}:@{linkedService().port}/@{linkedService().service_name}", "authenticationType": "Basic", "username": "@{linkedService().username}", "password": { "type": "AzureKeyVaultSecret", "store": { "referenceName": "Keyvault", "type": "LinkedServiceReference" }, "secretName": { "value": "@linkedService().password_secret_name", "type": "Expression" } }, "supportV1DataTypes": true }, "connectVia": { "referenceName": "leap-prod-onprem-ir-001", "type": "IntegrationRuntimeReference" } } } As you can see "authenticationType" is defined but my guess is that the publish and deployment step somehow drops that property. We are using "modern" deployment in Azure devops pipelines using Node.js. Would appreciate some help with this!Solvedmartin_larsson_ellevioMay 22, 2025Copper Contributor275Views1like6Comments
Resources
Tags
- azure data factory170 Topics
- Azure ETL46 Topics
- Copy Activity38 Topics
- Azure Data Integration37 Topics
- Mapping Data Flows28 Topics
- Azure Integration Runtime24 Topics
- ADF5 Topics
- azure data factory v23 Topics
- Data Flows3 Topics
- pipeline3 Topics