Azure Data Factory
169 TopicsOracle 2.0 Upgrade Woes with Self-Hosted Integration Runtime
This past weekend my ADF instance finally got the prompt to upgrade linked services that use the Oracle 1.0 connector, so I thought, "no problem!" and got to work upgrading my self-hosted integration runtime to 5.50.9171.1 Most of my connection use service_name during authentication, so according to the docs, I should be able to connect using the Easy Connect (Plus) Naming convention. When I do, I encounter this error: Test connection operation failed. Failed to open the Oracle database connection. ORA-50201: Oracle Communication: Failed to connect to server or failed to parse connect string ORA-12650: No common encryption or data integrity algorithm https://docs.oracle.com/error-help/db/ora-12650/ I did some digging on this error code, and the troubleshooting doc suggests that I reach out to my Oracle DBA to update Oracle server settings. Which, I did, but I have zero confidence the DBA will take any action. https://learn.microsoft.com/en-us/azure/data-factory/connector-troubleshoot-oracle Then I happened across this documentation about the upgraded connector. https://learn.microsoft.com/en-us/azure/data-factory/connector-oracle?tabs=data-factory#upgrade-the-oracle-connector Is this for real? ADF won't be able to connect to old versions of Oracle? If so I'm effed because my company is so so legacy and all of our Oracle servers at 11g. I also tried adding additional connection properties in my linked service connection like this, but I have honestly no idea what I'm doing: Encryption client: accepted Encryption types client: AES128, AES192, AES256, 3DES112, 3DES168 Crypto checksum client: accepted Crypto checksum types client: SHA1, SHA256, SHA384, SHA512 But no matter what, the issue persists. :( Am I missing something stupid? Are there ways to handle the encryption type mismatch client-side from the VM that runs the self-hosted integration runtime? I would hate to be in the business of managing an Oracle environment and tsanames.ora files, but I also don't want to re-engineer almost 100 pipelines because of a connector incompatability.Solved2.5KViews3likes14CommentsHow to Flatten Nested Time-Series JSON from API into Azure SQL using ADF Mapping Data Flow?
How to Flatten Nested Time-Series JSON from API into Azure SQL using ADF Mapping Data Flow? Hi Community, I'm trying to extract and load data from API returning the following JSON format into an Azure SQL table using Azure Data Factory. { "2023-07-30": [], "2023-07-31": [], "2023-08-01": [ { "breakdown": "email", "contacts": 2, "customers": 2 } ], "2023-08-02": [], "2023-08-03": [ { "breakdown": "direct", "contacts": 5, "customers": 1 }, { "breakdown": "referral", "contacts": 3, "customers": 0 } ], "2023-08-04": [], "2023-09-01": [ { "breakdown": "direct", "contacts": 76, "customers": 40 } ], "2023-09-02": [], "2023-09-03": [] } Goal: I want to flatten this nested structure and load it into Azure SQL like this: Expand table ReportDate Breakdown Contacts Customers 2023-07-30 (no row) (no row) (no row) 2023-07-31 (no row) (no row) (no row) 2023-08-01 email 2 2 2023-08-02 (no row) (no row) (no row) 2023-08-03 direct 5 1 2023-08-03 referral 3 0 2023-08-04 (no row) (no row) (no row) 2023-09-01 direct 76 40 2023-09-02 (no row) (no row) (no row) 2023-09-03 (no row) (no row) (no row)17Views0likes1CommentAnother Oracle 2.0 issue
It seemed like Oracle LS 2.0 was finally working in production. However, some pipelines have started to fail in both production and development environments with the following error message: ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.lang.ArrayIndexOutOfBoundsException:255 total entry:1 com.microsoft.datatransfer.bridge.parquet.ParquetWriterBuilderBridge.addDecimalColumn(ParquetWriterBuilderBridge.java:107) .,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,' When I revert the Linked Service version back to 1.0, the copy activity runs successfully. Has anyone encountered this issue before or found a workaround?46Views0likes0CommentsADF dataflow data Preview Error
hi All, I have data flow as seen below. all linked service and data set working fine and i can see the data preview but wheb i use the same linked service and dateset in the dataflow It throw error as shown below i am useing managed private endpoint to coonect the blob starga it is owrking for all pipe line. the ADF and the MI has staorgae account contributor role assigned. Error: at Source 'sourcedata': This request is not authorized to perform this operation. When using Managed Identity(MI)/Service Principal(SP) authentication 1. For source: In Storage Explorer, grant the MI/SP at least Execute permission for ALL upstream folders and the file system, along with Read permission for the files to copy. Alternatively, in Access control (IAM), grant the MI/SP at least the Storage Blob Data Reader role. 2. For sink: In Storage Explorer, grant the MI/SP at least Execute permission for ALL upstream folders and the file system, along with Write permission for the sink folder. Alternatively, in Access control (IAM), grant the MI/SP at least the Storage Blob Data Contributor role. Also please ensure that the network firewall settings in the storage account are configured correctly as turning on firewall rules for your storage account blocks incoming requests for data by default, unless the requests originate from a service operating within an Azure Virtual Network (VNet) or from allowed public IP addresses. Any kind of help is highly appreciated70Views0likes1CommentADF Data Flow Fails with "Path does not resolve to any file" — Dynamic Parameters via Trigger
Hi guys, I'm running into an issue with my Azure Data Factory pipeline triggered by a Blob event. The trigger passes dynamic folderPath and fileName values into a parameterized dataset and mapping data flow. Everything works perfectly when I debug the pipeline manually or trigger the pipeline manually with the trigger and pass in the values for folderPath and fileName directly. However, when the pipeline is triggered automatically via the blob event, the data flow fails with the following error: Error Message: Job failed due to reason: at Source 'CSVsource': Path /financials/V02/Forecast/ForecastSampleV02.csv does not resolve to any file(s). Please make sure the file/folder exists and is not hidden. At the same time, please ensure special character is not included in file/folder name, for example, name starting with _ I've verified the blob file exists. The trigger fires correctly and passes parameters The path looks valid. The dataset is parameterized correctly with @dataset().folderPath and @dataset().fileName I've attached screenshots of: 🔵 00-Pipeline Trigger Configuration On Blob creation 🔵 01-Trigger Parameters 🔵 02-Pipeline Parameters 🔵 03-Data flow Parameters 🔵 04-Data flow Parameters without default value 🔵 05-Data flow CSVsource parameters 🔵 06-Data flow Source Dataset 🔵 07-Data flow Source dataset Parameters 🔵 08-Data flow Source Parameters 🔵 09-Parameters passed to the pipeline from the trigger 🔵 10-Data flow error message Here are all the images What could be causing the data flow to fail on file path resolution only when triggered, even though the exact same parameters succeed during manual debug runs? Could this be related to: Extra slashes or encoding in trigger output? Misuse of @dataset().folderPath and fileName in the dataset? Limitations in how blob trigger outputs are parsed? Any insights would be appreciated! Thank youSolved60Views0likes1CommentWhat Are the Ways to Dynamically Invoke Pipelines in ADF from Another Pipeline?
I am exploring different approaches to dynamically invoke ADF pipelines from within another pipeline as part of a modular and scalable orchestration strategy. My use case involves having multiple reusable pipelines that can be called conditionally or in sequence, based on configuration stored externally (such as in a SQL Managed Instance or another Azure-native source). I am aware of a few patterns like using the Execute Pipeline activity within a ForEach loop, but I would like to understand the full range of available and supported options for dynamically invoking pipelines from within ADF. Could you please clarify the possible approaches for achieving this? Specifically, I am interested in: Using ForEach with Execute Pipeline activity How to structure the control flow for calling multiple pipelines in sequence or parallel. How to pass pipeline names dynamically. Dynamic pipeline name resolution Is it possible to pass the pipeline name as a parameter to the Execute Pipeline activity? How to handle validation when the pipeline name is dynamic? Parameterized execution Best practices for passing dynamic parameters to each pipeline when calling them in a loop or based on external config. Calling ADF pipelines via REST API or Web Activity When would this be preferred over native Execute Pipeline? How to handle authentication and response handling? If there are any recommendations, gotchas, or best practices related to dynamic pipeline orchestration in ADF, I would greatly appreciate your insights. Thanks!29Views0likes0CommentsHow to Orchestrate ADF Pipelines as Selectable Steps in a Configurable Job
I am working on building a dynamic job orchestration mechanism using Azure Data Factory (ADF). I have multiple pipelines in ADF, and each pipeline represents a distinct step in a larger job. I would like to implement a solution where I can dynamically select or deselect individual pipeline steps (i.e., ADF pipelines) as part of a job. The idea is to configure a job by checking/unchecking steps, and then execute only the selected ones in sequence or based on dependencies. Available resources for this solution: Azure Data Factory (ADF) Azure SQL Managed Instance (SQL MI) Any other relevant Azure-native service (if needed) Could you please suggest a solution that meets the following requirements: Dynamically configure which pipelines (steps) to include in a job. Add or remove steps without changing hardcoded logic in ADF. Ensure scalability and maintainability of the orchestration logic. Keep the solution within the scope of ADF, SQL MI, and potentially other Azure-native services (no external apps or third-party orchestrators). Any design pattern, architecture recommendations, or examples would be greatly appreciated. Thanks!22Views0likes0CommentsHow to Configure Authentication for Web Activity Triggering ADF Pipelines via Azure REST API
Hello, I am working on integrating Azure Data Factory (ADF) with external systems using Web Activities. I am specifically using a Web Activity to trigger ADF pipelines via the Azure REST API, as described in the official documentation here: https://learn.microsoft.com/en-us/rest/api/datafactory/pipelines/create-run?view=rest-datafactory-2018-06-01 I can configure the request method and URL in the Web Activity, but I am unsure about the supported and recommended methods for authentication. Could someone please clarify: What are the possible ways to configure authentication in Web Activities when calling Azure REST APIs (such as for creating a pipeline run)? Is it possible to use Managed Identity (System-assigned or User-assigned) directly within the Web Activity? If not, what are the alternatives (e.g., service principal with token acquisition)? Are there any best practices or security considerations when configuring authentication for this use case? Thanks in advance for your help!12Views0likes0CommentsAzure Devops and Data Factory
I have started a new job and taken over ADF. I know how to use Devops to integrate and deploy when everything is up and running. The problem is, it's all out of sync. I need to learn ADO/ADF as they work together so I can fix this. Any recommendations on where to start? Everything on YouTube is starting with a fresh environment which I'd be fine with. I'm not new to ADO, but I've never been the setup guy before. And I'm strong on ADO management, just using it. Here are some of the problems I have: A lot of work has been done directly in the DEV branch rather than creating feature branches. Setting up a pull request from DEV to PROD wants to pull everything. Even in-progress or abandoned code changes. Some changes were made in the PROD branch directly, so I'll need to pull those changes back to DEV. We have valid changes in both DEV and PROD. I'm having trouble cherry-picking. It only lets me select one commit, then says I need to use command-line. It doesn't tell me the error. I don't know what tool to use for the command line. I've tried using Visual Studio, and I can pull in the Data Factory code, but have all the same problems there. I'm not looking for an answer to the questions, but how to find the answer to these questions. Is this Data Factory, or should I be looking at Devops? I'm having no trouble managing the database code or Power BI in Devops, but I created that fresh. Thanks for any help!Solved174Views0likes4CommentsData transformation using PostgreSQL connection
Hi Team Currently, I am working on a project where I need to incrementally load one table data from Schema A of Azure postgreSQL database to schema B. The table i am using have 100+ columns and few of them are numeric and whenever i am trying to read data from source table using data flows its generating Null values for all numeric columns. See following screenshot for reference. Source: Destination: If you notice, the last row contains data for numeric data types as well. To ensure ADF reads numeric and decimal values correctly, I clicked on the "overwrite schema" button in the data flow source and reselected the same decimal data type, as ADF converts numeric data types to decimal. I suspect this might be a bug in ADF. Could you please let me know where I can report this issue? Additionally, is there a better way to resolve it? Thanks, Dinesh21Views0likes0Comments