Azure ETL
137 TopicsAnnouncing the new Databricks Job activity in ADF!
We’re excited to announce that Azure Data Factory now supports the orchestration of Databricks Jobs! Databrick Jobs allow you to schedule and orchestrate a task or multiple tasks in a workflow in your Databricks workspace. Since any operation in Databricks can be a task, this means you can now run anything in Databricks via ADF, such as serverless jobs, SQL tasks, Delta Live Tables, batch inferencing with model serving endpoints, or automatically publishing and refreshing semantic models in the Power BI service. And with this new update, you’ll be able to trigger these workflows from your Azure Data Factory pipelines. To make use of this new activity, you’ll find a new Databricks activity under the Databricks activity group called Job. Once you’ve added the Job activity (Preview) to your pipeline canvas, you can connect to your Databricks workspace and configure the settings to select your Databricks job, allowing you to run the Job from your pipeline. We also know that allowing parameterization in your pipelines is important as it allows you to create generic reusable pipeline models. ADF continues to provide support for these patterns and is excited to extend this capability to the new Databricks Job activity. Under the settings of your Job activity, you’ll also be able to configure and set parameters to send to your Databricks job, allowing maximum flexibility and power for your orchestration jobs. To learn more, read Azure Databricks activity - Microsoft Fabric | Microsoft Learn. Have any questions or feedback? Leave a comment below!5KViews1like2CommentsSolution: Handling Concurrency in Azure Data Factory with Marker Files and Web Activities
Hi everyone, I wanted to share a concurrency issue we encountered in Azure Data Factory (ADF) and how we resolved it using a small but effective enhancement—one that might be useful if you're working with shared Blob Storage across multiple environments (like Dev, Test, and Prod). Background: Shared Blob Storage & Marker Files In our ADF pipelines, we extract data from various sources (e.g., SharePoint, Oracle) and store them in Azure Blob Storage. That Blob container is shared across multiple environments. To prevent duplicate extractions, we use marker files: started.marker → created when a copy begins completed.marker → created when the copy finishes successfully If both markers exist, pipelines reuse the existing file (caching logic). This mechanism was already in place and worked well under normal conditions. The Issue: Race Conditions We observed that simultaneous executions from multiple environments sometimes led to: Overlapping attempts to create the same started.marker Duplicate copy activities Corrupted Blob files This became a serious concern because the Blob file was later loaded into Azure SQL Server, and any corruption led to failed loads. The Fix: Web Activity + REST API To solve this, we modified only the creation of started.marker by: Replacing Copy Activity with a Web Activity that calls the Azure Storage REST API The API uses Azure Blob Storage's conditional header If-None-Match: * to safely create the file only if it doesn't exist If the file already exists, the API returns "BlobAlreadyExists", which the pipeline handles by skipping. The Copy Activity is still used to copy the data and create the completed.marker—no changes needed there. Updated Flow Check marker files: If both exist (started and completed) → use cached file If only started.marker → wait and retry If none → continue to step 2 Web Activity calls REST API to create started.marker Success → proceed with copy in step 3 Failure → another run already started → skip/retry Copy Activity performs the data extract Copy Activity creates completed.marker Benefits Atomic creation of started.marker → no race conditions Minimal change to existing pipeline logic with marker files Reliable downstream loads into Azure SQL Server Preserves existing architecture (no full redesign) Would love to hear: Have you used similar marker-based patterns in ADF? Any other approaches to concurrency control that worked for your team? Thanks for reading! Hope this helps someone facing similar issues.45Views0likes0CommentsHow to Flatten Nested Time-Series JSON from API into Azure SQL using ADF Mapping Data Flow?
How to Flatten Nested Time-Series JSON from API into Azure SQL using ADF Mapping Data Flow? Hi Community, I'm trying to extract and load data from API returning the following JSON format into an Azure SQL table using Azure Data Factory. { "2023-07-30": [], "2023-07-31": [], "2023-08-01": [ { "breakdown": "email", "contacts": 2, "customers": 2 } ], "2023-08-02": [], "2023-08-03": [ { "breakdown": "direct", "contacts": 5, "customers": 1 }, { "breakdown": "referral", "contacts": 3, "customers": 0 } ], "2023-08-04": [], "2023-09-01": [ { "breakdown": "direct", "contacts": 76, "customers": 40 } ], "2023-09-02": [], "2023-09-03": [] } Goal: I want to flatten this nested structure and load it into Azure SQL like this: Expand table ReportDate Breakdown Contacts Customers 2023-07-30 (no row) (no row) (no row) 2023-07-31 (no row) (no row) (no row) 2023-08-01 email 2 2 2023-08-02 (no row) (no row) (no row) 2023-08-03 direct 5 1 2023-08-03 referral 3 0 2023-08-04 (no row) (no row) (no row) 2023-09-01 direct 76 40 2023-09-02 (no row) (no row) (no row) 2023-09-03 (no row) (no row) (no row)65Views0likes1CommentADF dataflow data Preview Error
hi All, I have data flow as seen below. all linked service and data set working fine and i can see the data preview but wheb i use the same linked service and dateset in the dataflow It throw error as shown below i am useing managed private endpoint to coonect the blob starga it is owrking for all pipe line. the ADF and the MI has staorgae account contributor role assigned. Error: at Source 'sourcedata': This request is not authorized to perform this operation. When using Managed Identity(MI)/Service Principal(SP) authentication 1. For source: In Storage Explorer, grant the MI/SP at least Execute permission for ALL upstream folders and the file system, along with Read permission for the files to copy. Alternatively, in Access control (IAM), grant the MI/SP at least the Storage Blob Data Reader role. 2. For sink: In Storage Explorer, grant the MI/SP at least Execute permission for ALL upstream folders and the file system, along with Write permission for the sink folder. Alternatively, in Access control (IAM), grant the MI/SP at least the Storage Blob Data Contributor role. Also please ensure that the network firewall settings in the storage account are configured correctly as turning on firewall rules for your storage account blocks incoming requests for data by default, unless the requests originate from a service operating within an Azure Virtual Network (VNet) or from allowed public IP addresses. Any kind of help is highly appreciated156Views0likes1CommentADF Data Flow Fails with "Path does not resolve to any file" — Dynamic Parameters via Trigger
Hi guys, I'm running into an issue with my Azure Data Factory pipeline triggered by a Blob event. The trigger passes dynamic folderPath and fileName values into a parameterized dataset and mapping data flow. Everything works perfectly when I debug the pipeline manually or trigger the pipeline manually with the trigger and pass in the values for folderPath and fileName directly. However, when the pipeline is triggered automatically via the blob event, the data flow fails with the following error: Error Message: Job failed due to reason: at Source 'CSVsource': Path /financials/V02/Forecast/ForecastSampleV02.csv does not resolve to any file(s). Please make sure the file/folder exists and is not hidden. At the same time, please ensure special character is not included in file/folder name, for example, name starting with _ I've verified the blob file exists. The trigger fires correctly and passes parameters The path looks valid. The dataset is parameterized correctly with @dataset().folderPath and @dataset().fileName I've attached screenshots of: 🔵 00-Pipeline Trigger Configuration On Blob creation 🔵 01-Trigger Parameters 🔵 02-Pipeline Parameters 🔵 03-Data flow Parameters 🔵 04-Data flow Parameters without default value 🔵 05-Data flow CSVsource parameters 🔵 06-Data flow Source Dataset 🔵 07-Data flow Source dataset Parameters 🔵 08-Data flow Source Parameters 🔵 09-Parameters passed to the pipeline from the trigger 🔵 10-Data flow error message https://primeinnovativetechnologies-my.sharepoint.com/:b:/g/personal/john_primeinntech_com/EYoH5Sm_GaFGgvGAOEpbdXQB7QJFeXvbFmCbZiW85PwrNA?e=0yjeJR What could be causing the data flow to fail on file path resolution only when triggered, even though the exact same parameters succeed during manual debug runs? Could this be related to: Extra slashes or encoding in trigger output? Misuse of @dataset().folderPath and fileName in the dataset? Limitations in how blob trigger outputs are parsed? Any insights would be appreciated! Thank youSolved163Views0likes1CommentHow to Configure Authentication for Web Activity Triggering ADF Pipelines via Azure REST API
Hello, I am working on integrating Azure Data Factory (ADF) with external systems using Web Activities. I am specifically using a Web Activity to trigger ADF pipelines via the Azure REST API, as described in the official documentation here: https://learn.microsoft.com/en-us/rest/api/datafactory/pipelines/create-run?view=rest-datafactory-2018-06-01 I can configure the request method and URL in the Web Activity, but I am unsure about the supported and recommended methods for authentication. Could someone please clarify: What are the possible ways to configure authentication in Web Activities when calling Azure REST APIs (such as for creating a pipeline run)? Is it possible to use Managed Identity (System-assigned or User-assigned) directly within the Web Activity? If not, what are the alternatives (e.g., service principal with token acquisition)? Are there any best practices or security considerations when configuring authentication for this use case? Thanks in advance for your help!55Views0likes0CommentsExcel column header verification using schema in database
I have a requirement where we need to do data quality check on the excel files in Azure Blob with the Schema stored in the Database. Azure Blob has a container in which we have multiple excel files with data. These files generally follow a structure and few business rules, for example, if the data is related to employee there will be 10 columns, all rows in colA = 'abc' (same data), colB should be date in some format, colC is number and less than 5 and likewise. Similarly different excels have different headers, no of columns, structure and business rules. A table is maintained in the database with the structure and business rules. ExcelTemplateId ExcelTemplateName ColumnName MaxLength DataType DefaultValue 1 abc name 255 varchar 1 abc empId 10 int 1 abc dept 100 xyz I need to create an adf pipeline which will read the excel files one by one from the source and compare with the schema (present in the database) and copy the good data to location01 and bad data to location02. Location01 and 02 can be a table in database. I do not wish to create one pipeline for each excel sheet, rather it should be a dynamic one which would handle all excels. How can I achieve this?78Views0likes0CommentsOData Connector for Dynamics Business Central
Hey Guys, I'm trying to connect Dynamics Business Central OData API in ADF but I'm not sure what I'm doing wrong here because the same Endpoint is returning data on Postman but returning an error in ADF LinkedService. https://api.businesscentral.dynamics.com/v2.0/{tenant-id}/Sandbox-UAT/ODataV4/Company('company-name')/Chart_of_Accounts187Views0likes1Comment