Web activity
3 TopicsSolution: Handling Concurrency in Azure Data Factory with Marker Files and Web Activities
Hi everyone, I wanted to share a concurrency issue we encountered in Azure Data Factory (ADF) and how we resolved it using a small but effective enhancement—one that might be useful if you're working with shared Blob Storage across multiple environments (like Dev, Test, and Prod). Background: Shared Blob Storage & Marker Files In our ADF pipelines, we extract data from various sources (e.g., SharePoint, Oracle) and store them in Azure Blob Storage. That Blob container is shared across multiple environments. To prevent duplicate extractions, we use marker files: started.marker → created when a copy begins completed.marker → created when the copy finishes successfully If both markers exist, pipelines reuse the existing file (caching logic). This mechanism was already in place and worked well under normal conditions. The Issue: Race Conditions We observed that simultaneous executions from multiple environments sometimes led to: Overlapping attempts to create the same started.marker Duplicate copy activities Corrupted Blob files This became a serious concern because the Blob file was later loaded into Azure SQL Server, and any corruption led to failed loads. The Fix: Web Activity + REST API To solve this, we modified only the creation of started.marker by: Replacing Copy Activity with a Web Activity that calls the Azure Storage REST API The API uses Azure Blob Storage's conditional header If-None-Match: * to safely create the file only if it doesn't exist If the file already exists, the API returns "BlobAlreadyExists", which the pipeline handles by skipping. The Copy Activity is still used to copy the data and create the completed.marker—no changes needed there. Updated Flow Check marker files: If both exist (started and completed) → use cached file If only started.marker → wait and retry If none → continue to step 2 Web Activity calls REST API to create started.marker Success → proceed with copy in step 3 Failure → another run already started → skip/retry Copy Activity performs the data extract Copy Activity creates completed.marker Benefits Atomic creation of started.marker → no race conditions Minimal change to existing pipeline logic with marker files Reliable downstream loads into Azure SQL Server Preserves existing architecture (no full redesign) Would love to hear: Have you used similar marker-based patterns in ADF? Any other approaches to concurrency control that worked for your team? Thanks for reading! Hope this helps someone facing similar issues.41Views0likes0CommentsWhat Are the Ways to Dynamically Invoke Pipelines in ADF from Another Pipeline?
I am exploring different approaches to dynamically invoke ADF pipelines from within another pipeline as part of a modular and scalable orchestration strategy. My use case involves having multiple reusable pipelines that can be called conditionally or in sequence, based on configuration stored externally (such as in a SQL Managed Instance or another Azure-native source). I am aware of a few patterns like using the Execute Pipeline activity within a ForEach loop, but I would like to understand the full range of available and supported options for dynamically invoking pipelines from within ADF. Could you please clarify the possible approaches for achieving this? Specifically, I am interested in: Using ForEach with Execute Pipeline activity How to structure the control flow for calling multiple pipelines in sequence or parallel. How to pass pipeline names dynamically. Dynamic pipeline name resolution Is it possible to pass the pipeline name as a parameter to the Execute Pipeline activity? How to handle validation when the pipeline name is dynamic? Parameterized execution Best practices for passing dynamic parameters to each pipeline when calling them in a loop or based on external config. Calling ADF pipelines via REST API or Web Activity When would this be preferred over native Execute Pipeline? How to handle authentication and response handling? If there are any recommendations, gotchas, or best practices related to dynamic pipeline orchestration in ADF, I would greatly appreciate your insights. Thanks!51Views0likes0CommentsHow to Configure Authentication for Web Activity Triggering ADF Pipelines via Azure REST API
Hello, I am working on integrating Azure Data Factory (ADF) with external systems using Web Activities. I am specifically using a Web Activity to trigger ADF pipelines via the Azure REST API, as described in the official documentation here: https://learn.microsoft.com/en-us/rest/api/datafactory/pipelines/create-run?view=rest-datafactory-2018-06-01 I can configure the request method and URL in the Web Activity, but I am unsure about the supported and recommended methods for authentication. Could someone please clarify: What are the possible ways to configure authentication in Web Activities when calling Azure REST APIs (such as for creating a pipeline run)? Is it possible to use Managed Identity (System-assigned or User-assigned) directly within the Web Activity? If not, what are the alternatives (e.g., service principal with token acquisition)? Are there any best practices or security considerations when configuring authentication for this use case? Thanks in advance for your help!48Views0likes0Comments