azure data factory
391 TopicsSharePoint Online Multiple Files (Folder) Copy with Http Connector
This blog shows how to copy multiple files from a folder from SharePoint Online using ADF. Go through this public documentation on how to copy a single file - Copy data from SharePoint Online List by using Azure Data Factory - Azure Data Factory | Microsoft Docs31KViews9likes36CommentsGetting an Oauth2 API access token using client_id and client_secret - help
Hi, I'm attempting to integrate external data into our SQL Server. The third-party data is from a solution called iLevel. They use token based Oauth2 APIs for access. The integration tool is ADF Pipelines. I'm not a data engineer but it has fallen upon me to complete this exercise. What I've attempted so far is failing and I don't know why. I would like your help on this. I'll explain what I've configured so far in the order I configured it. 1) To generate a client_id and client_secret, I logged on to the iLevel solution itself and generated the same for my account (call me 'Joe' account) and the Team account (call it 'Data team' account). I've recorded the client_id and client_secret for both users/accounts in notepad for reference. 2) I logged in Azure Data Factory using my 'Joe Admin' admin account (this is the account I need to log in with for any ADF development). 3) I created a Linked Service with the following configuration. Note how the Test connection was successful. I guess this means our ADF instance can connect to iLevel's Base URL. 4) I then created a dataset for iLevel. I configured this based on an online example I was following which I can't get working, so this configuration may be incorrect. 5) I then created a Pipeline which contains a 'Web' activity and a 'Set variable' activity. The Pipeline has a variable as shown below. The 'Web' activity has the following configuration: URL = is iLevel's token URL (it is different from the Base URL used in the Linked Service). Body = I've blocked out the client_id and client_secret (I'm using the client_id and client_secret generated for the 'Data team' account - remember I'm logged into ADF using the 'Joe Admin' account - not sure if this makes a difference) but have placed red brackets around where the start and end of each values is. I'm wrapping the values in any single or double quotes - not sure if I'm meant to. I'm not sure if I have configured the Body correctly. The ilevel documentation states to use an Authorization header, Content-Type header and Body - it states to the following is needed to obtain an access token, but it doesn't state exactly how to submit the information (i.e. how to format it). Notice how, in my configuration, I haven't used an Authorization header - this partly because an online example I've followed doesn't use one. If iLevel state to use one then I think I should but I don't know how to format it - any ideas? The 'Set variable' activity has the following activity. The idea is the access token is retrieved from the 'Web' activity and placed in the 'Set variable' "iLevel access token" variable. At this point I validate all and it comes back with no errors found. I then Debug it to see if it does indeed work but it returns an error stating the request contains an invalid client_id or client_secret. The client_id and client_secret values used are the exact same I generated from within the iLevel solution just a few hours ago. Is anyone able to point out to me why this isn't working? Have I populated all that I need to (as mentioned, iLevel say to use an Authorization header which I haven't but I don't know how to format it if I were to use one)? What can I do to get this working? I'm just trying to get the access token at the moment. I've not even attempted to extract the iLevel data and can't until I get a working token. iLevel's token have a 1 hour time-to-live so the Pipeline needs to generate a new token each time it's executed. You help will be most appreciated. Thanks.44Views1like0CommentsDynamics AX connector stops getting records after amount of time
Hello everyone, I am using the Dynamics AX connector to get data out of Finance. After a certain amount of time it suddenly doesnt get any new records anymore and it keeps running until it reaches the general timeout. It gets 290,000 records in like 01:30:00 and then keeps running and doesn't get new records anymore. Sometimes it gets stuck earlier or later. Sometimes it also gives me this error: Failure happened on 'Source' side. ErrorCode=ODataRequestTimeout,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Fail to get response from odata service in a expected time.,Source=Microsoft.DataTransfer.Runtime.ODataConnector,''Type=System.Threading.Tasks.TaskCanceledException,Message=A task was canceled.,Source=mscorlib,' This is my pipeline JSON: { "name": "HICT - Init Sync SalesOrders", "properties": { "activities": [ { "name": "Get FO SalesOrders", "type": "Copy", "dependsOn": [], "policy": { "timeout": "0.23:00:00", "retry": 0, "retryIntervalInSeconds": 30, "secureOutput": false, "secureInput": false }, "userProperties": [], "typeProperties": { "source": { "type": "DynamicsAXSource", "query": "$filter=FM_InterCompanyOrder eq Microsoft.Dynamics.DataEntities.NoYes'No' and dataAreaId eq 'prev'&$select=SalesOrderNumber,SalesOrderName,IsDeliveryAddressPrivate,FormattedInvoiceAddress,FormattedDeliveryAddress,ArePricesIncludingSalesTax,RequestedReceiptDate,QuotationNumber,PriceCustomerGroupCode,PBS_PreferredInvoiceDate,PaymentTermsBaseDate,OrderTotalTaxAmount,OrderTotalChargesAmount,OrderTotalAmount,TotalDiscountAmount,IsInvoiceAddressPrivate,InvoiceBuildingCompliment,InvoiceAddressZipCode,LanguageId,IsDeliveryAddressOrderSpecific,IsOneTimeCustomer,InvoiceAddressStreetNumber,InvoiceAddressStreet,InvoiceAddressStateId,InvoiceAddressPostBox,InvoiceAddressLongitude,InvoiceAddressLatitude,InvoiceAddressDistrictName,InvoiceAddressCountyId,InvoiceAddressCountryRegionISOCode,InvoiceAddressCity,FM_Deadline,Email,DeliveryTermsCode,DeliveryModeCode,DeliveryBuildingCompliment,DeliveryAddressCountryRegionISOCode,DeliveryAddressZipCode,DeliveryAddressStreetNumber,SalesOrderStatus,DeliveryAddressStreet,DeliveryAddressStateId,SalesOrderPromisingMethod,DeliveryAddressPostBox,DeliveryAddressName,DeliveryAddressLongitude,DeliveryAddressLocationId,DeliveryAddressLatitude,DeliveryAddressDunsNumber,DeliveryAddressDistrictName,DeliveryAddressDescription,DeliveryAddressCountyId,DeliveryAddressCity,CustomersOrderReference,IsSalesProcessingStopped,CustomerRequisitionNumber,SalesOrderProcessingStatus,CurrencyCode,ConfirmedShippingDate,ConfirmedReceiptDate,SalesOrderOriginCode,URL,OrderingCustomerAccountNumber,InvoiceCustomerAccountNumber,ContactPersonId,FM_WorkerSalesTaker,FM_SalesResponsible,PaymentTermsName,DefaultShippingSiteId,DefaultShippingWarehouseId,DeliveryModeCode,dataAreaId,FM_InterCompanyOrder&cross-company=true", "httpRequestTimeout": "00:15:00", "additionalHeaders": { "Prefer": "odata.maxpagesize=1000" }, "retrieveEnumValuesAsString": true }, "sink": { "type": "JsonSink", "storeSettings": { "type": "AzureBlobStorageWriteSettings", "copyBehavior": "FlattenHierarchy" }, "formatSettings": { "type": "JsonWriteSettings" } }, "enableStaging": false, "enableSkipIncompatibleRow": true, "logSettings": { "enableCopyActivityLog": true, "copyActivityLogSettings": { "logLevel": "Warning", "enableReliableLogging": false }, "logLocationSettings": { "linkedServiceName": { "referenceName": "AzureBlobStorage", "type": "LinkedServiceReference" }, "path": "ceexports" } } }, "inputs": [ { "referenceName": "AX_SalesOrders_Dynamics_365_FO_ACC", "type": "DatasetReference" } ], "outputs": [ { "referenceName": "Orders_FO_D365_Data_JSON", "type": "DatasetReference" } ] }, { "name": "Get_All_CE_Table_Data", "type": "ForEach", "dependsOn": [ { "activity": "Get FO SalesOrders", "dependencyConditions": [ "Completed" ] } ], "userProperties": [], "typeProperties": { "items": { "value": "@pipeline().parameters.CE_Tables", "type": "Expression" }, "activities": [ { "name": "Copy_CE_TableData", "type": "Copy", "dependsOn": [], "policy": { "timeout": "0.12:00:00", "retry": 0, "retryIntervalInSeconds": 30, "secureOutput": false, "secureInput": false }, "userProperties": [], "typeProperties": { "source": { "type": "CommonDataServiceForAppsSource" }, "sink": { "type": "DelimitedTextSink", "storeSettings": { "type": "AzureBlobStorageWriteSettings", "copyBehavior": "FlattenHierarchy" }, "formatSettings": { "type": "DelimitedTextWriteSettings", "quoteAllText": true, "fileExtension": ".txt" } }, "enableStaging": false }, "inputs": [ { "referenceName": "CE_Look_Up_Tables", "type": "DatasetReference", "parameters": { "entiryName": "@item().sourceDataset" } } ], "outputs": [ { "referenceName": "CE_GenericBlobSink", "type": "DatasetReference", "parameters": { "sinkPath": { "value": "@item().sinkPath", "type": "Expression" } } } ] } ] } }, { "name": "Transform_Create_CE_JSON", "type": "ExecuteDataFlow", "dependsOn": [ { "activity": "Get_All_CE_Table_Data", "dependencyConditions": [ "Succeeded" ] } ], "policy": { "timeout": "0.12:00:00", "retry": 0, "retryIntervalInSeconds": 30, "secureOutput": false, "secureInput": false }, "userProperties": [], "typeProperties": { "dataflow": { "referenceName": "FO_Transform_CE_Select", "type": "DataFlowReference" }, "compute": { "coreCount": 16, "computeType": "General" }, "traceLevel": "Fine" } } ], "parameters": { "CE_Tables": { "type": "array", "defaultValue": [ { "name": "D365_CE_ACC_AccountRelations", "sourceDataset": "crmp_accountrelation", "sinkPath": "ce-exports/D365_CE_ACC_AccountRelations.json" }, { "name": "D365_CE_ACC_ContactRelations", "sourceDataset": "crmp_contactrelation", "sinkPath": "ce-exports/D365_CE_ACC_ContactRelations.json" }, { "name": "D365_CE_ACC_PriceCustomerGroup", "sourceDataset": "msdyn_pricecustomergroup", "sinkPath": "ce-exports/D365_CE_ACC_PriceCustomerGroup.json" }, { "name": "D365_CE_ACC_SalesOrderOrigin", "sourceDataset": "odin_salesorderorigin", "sinkPath": "ce-exports/D365_CE_ACC_SalesOrderOrigin.json" }, { "name": "D365_CE_ACC_ShipVia", "sourceDataset": "msdyn_shipvia", "sinkPath": "ce-exports/D365_CE_ACC_ShipVia.json" }, { "name": "D365_CE_ACC_SystemUser", "sourceDataset": "systemuser", "sinkPath": "ce-exports/D365_CE_ACC_SystemUser.json" }, { "name": "D365_CE_ACC_TermsOfDelivery", "sourceDataset": "msdyn_termsofdelivery", "sinkPath": "ce-exports/D365_CE_ACC_TermsOfDelivery.json" }, { "name": "D365_CE_ACC_Worker", "sourceDataset": "cdm_worker", "sinkPath": "ce-exports/D365_CE_ACC_Worker.json" }, { "name": "D365_CE_ACC_TransactionCurrency", "sourceDataset": "transactioncurrency", "sinkPath": "ce-exports/D365_CE_ACC_TransactionCurrency.json" }, { "name": "D365_CE_ACC_Warehouse", "sourceDataset": "msdyn_warehouse", "sinkPath": "ce-exports/D365_CE_ACC_Warehouse.json" }, { "name": "D365_CE_ACC_OperationalSite", "sourceDataset": "msdyn_operationalsite", "sinkPath": "ce-exports/D365_CE_ACC_OperationalSite.json" }, { "name": "D365_CE_ACC_PaymentTerms", "sourceDataset": "odin_paymentterms", "sinkPath": "ce-exports/D365_CE_ACC_PaymentTerms.json" } ] } }, "annotations": [], "lastPublishTime": "2025-07-30T12:55:32Z" }, "type": "Microsoft.DataFactory/factories/pipelines" }46Views0likes0CommentsAzure Data Factory ForEach Loop Fails Despite Inner Activity Error Handling - Seeking Best Practices
Hello Azure Data Factory Community, I'm encountering a persistent issue with my ADF pipeline where a ForEach loop is failing, even though I've implemented error handling for the inner activities. I'm looking for insights and best practices on how to prevent internal activity failures from propagating up and causing the entire ForEach loop (and subsequently the pipeline) to fail, while still logging all outcomes. My Setup: My pipeline processes records using a ForEach loop. Inside the loop, I have a Web activity (Sample_put_record) that calls an external API. This API call can either succeed or fail for individual records. My current error handling within the ForEach iteration is structured as follows: 1.Sample_put_record (Web Activity): Makes the API call. 2.Conditional Logic: I've tried two main approaches: •Approach A (Direct Success/Failure Paths): The Sample_put_record activity has a green arrow (on success) leading to a Log Success Items (Script activity) and a red arrow (on failure) leading to a Log Failed Items (Script activity). Both logging activities are followed by Wait activities (Dummy Wait For Success/Failure). •Approach B (If Condition Wrapper): I've wrapped the Sample_put_record activity and its success/failure logging within an If Condition activity. The If Condition's expression is @equals(activity('Sample_put_record').status, 'Succeeded'). The True branch contains the success logging, and the False branch contains the failure logging. The intention here was for the If Condition to always report success, regardless of the Sample_put_record outcome, to prevent the ForEach from failing. The Problem: Despite these error handling attempts, the ForEach loop (and thus the overall pipeline) still fails when an Sample_put_record activity fails. The error message I typically see for the ForEach activity is "Activity failed because an inner activity failed." When using the If Condition wrapper, the If Condition itself sometimes fails with the same error, indicating that an activity within its True or False branch is still causing a hard failure. For example, a common failure for Sample_put_record is: "valid":false,"message":"WARNING: There was no xxxxxxxxxxxxxxxxxxxxxxxxx scheduled..." (a user configuration/data issue). Even when my Log Failed Items script attempts to capture this, the ForEach still breaks. What I've Ensured/Considered: •Wait Activity Configuration: Wait activities are configured with reasonable durations and do not appear to be the direct cause of failure. •No Unhandled Exceptions: I'm trying to ensure no unhandled exceptions are propagating from my error handling activities. •Pipeline Status Goal: My ultimate goal is for the overall pipeline status to be Succeeded as long as the pipeline completes its execution, even if some Sample_put_record calls fail and are logged. I need to rely on the logs to identify actual failures, not the pipeline status. My Questions to the Community: 1.What is the definitive best practice in Azure Data Factory to ensure a ForEach loop never fails due to an inner activity failure, assuming the inner activity's failure is properly logged and handled within that iteration? 2.Are there specific nuances or common pitfalls with If Condition activities or Script activities within ForEach loops that could still cause failure propagation, even with try-catch and success exits? 3.How do you typically structure your ADF pipelines to achieve this level of resilience where internal failures are logged but don't impact the overall pipeline success status? 4.Are there any specific configurations on the ForEach activity itself (e.g., Continue on error setting, if it exists for ForEach?) or other activities that I might be overlooking? Any detailed examples, architectural patterns, or debugging tips would be greatly appreciated. Thank you in advance for your help!84Views0likes0CommentsOracle 2.0 Upgrade Woes with Self-Hosted Integration Runtime
This past weekend my ADF instance finally got the prompt to upgrade linked services that use the Oracle 1.0 connector, so I thought, "no problem!" and got to work upgrading my self-hosted integration runtime to 5.50.9171.1 Most of my connection use service_name during authentication, so https://learn.microsoft.com/en-us/azure/data-factory/connector-oracle?tabs=data-factory, I should be able to connect using the Easy Connect (Plus) Naming convention. When I do, I encounter this error: Test connection operation failed. Failed to open the Oracle database connection. ORA-50201: Oracle Communication: Failed to connect to server or failed to parse connect string ORA-12650: No common encryption or data integrity algorithm https://docs.oracle.com/error-help/db/ora-12650/ I did some digging on this error code, and the troubleshooting doc suggests that I reach out to my Oracle DBA to update Oracle server settings. Which, I did, but I have zero confidence the DBA will take any action. https://learn.microsoft.com/en-us/azure/data-factory/connector-troubleshoot-oracle Then I happened across this documentation about the upgraded connector. https://learn.microsoft.com/en-us/azure/data-factory/connector-oracle?tabs=data-factory#upgrade-the-oracle-connector Is this for real? ADF won't be able to connect to old versions of Oracle? If so I'm effed because my company is so so legacy and all of our Oracle servers at 11g. I also tried adding additional connection properties in my linked service connection like this, but I have honestly no idea what I'm doing: Encryption client: accepted Encryption types client: AES128, AES192, AES256, 3DES112, 3DES168 Crypto checksum client: accepted Crypto checksum types client: SHA1, SHA256, SHA384, SHA512 But no matter what, the issue persists. :( Am I missing something stupid? Are there ways to handle the encryption type mismatch client-side from the VM that runs the self-hosted integration runtime? I would hate to be in the business of managing an Oracle environment and tsanames.ora files, but I also don't want to re-engineer almost 100 pipelines because of a connector incompatability.Solved5.7KViews3likes15CommentsSolution: Handling Concurrency in Azure Data Factory with Marker Files and Web Activities
Hi everyone, I wanted to share a concurrency issue we encountered in Azure Data Factory (ADF) and how we resolved it using a small but effective enhancement—one that might be useful if you're working with shared Blob Storage across multiple environments (like Dev, Test, and Prod). Background: Shared Blob Storage & Marker Files In our ADF pipelines, we extract data from various sources (e.g., SharePoint, Oracle) and store them in Azure Blob Storage. That Blob container is shared across multiple environments. To prevent duplicate extractions, we use marker files: started.marker → created when a copy begins completed.marker → created when the copy finishes successfully If both markers exist, pipelines reuse the existing file (caching logic). This mechanism was already in place and worked well under normal conditions. The Issue: Race Conditions We observed that simultaneous executions from multiple environments sometimes led to: Overlapping attempts to create the same started.marker Duplicate copy activities Corrupted Blob files This became a serious concern because the Blob file was later loaded into Azure SQL Server, and any corruption led to failed loads. The Fix: Web Activity + REST API To solve this, we modified only the creation of started.marker by: Replacing Copy Activity with a Web Activity that calls the Azure Storage REST API The API uses Azure Blob Storage's conditional header If-None-Match: * to safely create the file only if it doesn't exist If the file already exists, the API returns "BlobAlreadyExists", which the pipeline handles by skipping. The Copy Activity is still used to copy the data and create the completed.marker—no changes needed there. Updated Flow Check marker files: If both exist (started and completed) → use cached file If only started.marker → wait and retry If none → continue to step 2 Web Activity calls REST API to create started.marker Success → proceed with copy in step 3 Failure → another run already started → skip/retry Copy Activity performs the data extract Copy Activity creates completed.marker Benefits Atomic creation of started.marker → no race conditions Minimal change to existing pipeline logic with marker files Reliable downstream loads into Azure SQL Server Preserves existing architecture (no full redesign) Would love to hear: Have you used similar marker-based patterns in ADF? Any other approaches to concurrency control that worked for your team? Thanks for reading! Hope this helps someone facing similar issues.38Views0likes0CommentsHow to Flatten Nested Time-Series JSON from API into Azure SQL using ADF Mapping Data Flow?
How to Flatten Nested Time-Series JSON from API into Azure SQL using ADF Mapping Data Flow? Hi Community, I'm trying to extract and load data from API returning the following JSON format into an Azure SQL table using Azure Data Factory. { "2023-07-30": [], "2023-07-31": [], "2023-08-01": [ { "breakdown": "email", "contacts": 2, "customers": 2 } ], "2023-08-02": [], "2023-08-03": [ { "breakdown": "direct", "contacts": 5, "customers": 1 }, { "breakdown": "referral", "contacts": 3, "customers": 0 } ], "2023-08-04": [], "2023-09-01": [ { "breakdown": "direct", "contacts": 76, "customers": 40 } ], "2023-09-02": [], "2023-09-03": [] } Goal: I want to flatten this nested structure and load it into Azure SQL like this: Expand table ReportDate Breakdown Contacts Customers 2023-07-30 (no row) (no row) (no row) 2023-07-31 (no row) (no row) (no row) 2023-08-01 email 2 2 2023-08-02 (no row) (no row) (no row) 2023-08-03 direct 5 1 2023-08-03 referral 3 0 2023-08-04 (no row) (no row) (no row) 2023-09-01 direct 76 40 2023-09-02 (no row) (no row) (no row) 2023-09-03 (no row) (no row) (no row)34Views0likes1Comment