Mapping Data Flows

28 Topics

Auto Create SQL table from Imported CSV in ADF?
Hi All Wondering if its possible to automatically create a sql table from an imported CSV in Data Factory? To make things a little more complicated not all csv's will have the same headings and differing number of headings. If so, can anyone point me in the direction of a how to guide? Case I've been ingesting csv's using a HTTP connector in ADF then storing CSV data into a SQL table (manually created) and then transforming and cleaning said data into a Datastore SQL table that is also manually created. I know I'm a little slow to the party but I've been looking at using parameters and was wondering if I pulled the csv's into blob storage and then transformed from there. Then I'd only need to create one SQL table and if I could automate the entire process in ADF that would save future me a lot of time. I have another issue but I'll post that separate as its a slightly different topic. Thanks for Reading Hope someone can point me in the right direction. John
Solved
John Dorrian
Aug 12, 2021 Place Azure Data Factory
6KViews
1like
1Comment
Convert Arrays or Complex Types (JSON) to string in Data Flow
The current scenarios allow you to convert Arrays or Complex Types (JSON) into single columns or convert columns into JSON. However, is there a way to convert the JSON into a string representation for storage into a SQL Column? Example: I have a JSON file that contains a section of known data values which will map directly to columns. The file contains "additional data" that is not necessarily in a known structure, so I simply want to stuff that data in a column for later use. Source File: { "id": 1234, "data": { "name": "Acme, Inc.", "address": { "city": "Seattle", "state": "WA", "zip": "98195" } }, "additionalData": { "color": "red", "shape": "square", "number": 4 } } In this case, I don't want to store color, shape, and number as distinct columns, but as text: { "color": "red", "shape": "square", "number": 4 }. Is this possible currently, or something coming in the future?
jonesb321
Jun 03, 2020 Place Azure Data Factory
4.2KViews
0likes
1Comment
How to handle azure data factory lookup activity with more than 5000 records
Hello Experts, The DataFlow Activity successfully copies data from an Azure Blob Storage .csv file to Dataverse Table Storage. However, an error occurs when performing a Lookup on the Dataverse due to excessive data. This issue is in line with the documentation, which states that the Lookup activity has a limit of 5,000 rows and a maximum size of 4 MB. Also, there is a Workaround mentioned (Micrsofot Documentation): Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size. How can I do this? Is there a way to define an offset (e.g. only read 1000 rows) Thanks, -Sri
DynamicsHulk
Feb 23, 2024 Place Azure Data Factory
3.5KViews
0likes
1Comment
Unpivot Transformation Error: XX column is not of the same unpivot datatype (Data flow in ADF)
Hi Everyone, I want to unpivot the following data table, but I got the error: KPI1 column is not of the same unpivot datatype. I checked the KPI1 datatype. It is a long format. I think it is one type of numerical datatypes (integer, decimal, long, double, etc), all numerical columns should be treated in the same way. Is there any requirement for datatype in unpivot transformation, for example numerical columns? Thank you for any insights to fix the error.
Qianru_Song
Jan 12, 2021 Place Azure Data Factory
2.3KViews
0likes
0Comments
Connection Failed - new linked services azure data factory
Hi, I am having trouble creating a new linked service on Azure for connecting it with MySQL. I am getting a weird error saying that " Connection failed The value of the property '' is invalid: 'Format of the initialization string does not conform to specification starting at index 115.'. Format of the initialization string does not conform to specification starting at index 115. anyone aware of this error.
AmitGaikwadEcolab
Sep 21, 2020 Place Azure Data Factory
2.2KViews
0likes
0Comments
Job aborted due to stage failure. Executor heartbeat timedout error after 203646 ms
Hi, We are getting below error sometimes randomly during the execution of different mapping data flow in Azure data factory. Job aborted due to stage failure. Task in stage failed 1 times , most recent failure. Lost task in stage. Executor heartbeat timedout error after 203646 ms. Is anyone aware of the reason of this ? Any idea to resolve this would be appreciated.
SKDutta
Nov 03, 2021 Place Azure Data Factory
2KViews
0likes
0Comments
ADF Dataflow error
Hi , While executing different mapping dataflow in azure data factory we received below error few times. Can anyone give any idea about the root cause of the error and how to resolve it ? org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage failed 1 times, most recent failure: Lost task 0.0 in stage executor 0: ExecutorLostFailure (executor 0 exited caused by one of the running tasks) Reason: Executor heartbeat timed out after 136606 ms
SKDutta
Aug 12, 2021 Place Azure Data Factory
1.9KViews
0likes
0Comments
How to load data from On-prem to Snowflake using ADF in better way
Hi, My use case is as follows: Our data source is an On-prem SQL Server, and it serves as our production database. Currently, we are building reports in Power BI and utilizing Snowflake as our data warehouse. I aim to extract 10 to 15 tables for my Power BI reporting into Snowflake, specifically wanting to construct a SCD Type 1 Pipeline, without the need to retain historical data. To facilitate the data transfer from On-Prem to Snowflake, we are leveraging Azure Data Factory, with blob storage set up in Azure. We already have a Self-hosted runtime in place that connects to Data Factory. Currently, I've employed the For Each loop activity to copy 15 tables from On-prem to Snowflake. The pipeline is scheduled to run daily, and each time it executes, it truncates all 15 tables before loading the data. However, this process is time-consuming due to the volume of data, especially since our On-prem SQL server is a legacy database. My questions are as follows: Is there a more efficient approach within Data Factory for this task? What are the best practices recommended for this type of case study? Dataflow is not functioning with the Self-hosted runtime; how can I activate Dataflow for an On-prem database? I would greatly appreciate any advice on the correct solution for this or pointers to relevant documentation or blogs.
jason2500
Jan 23, 2024 Place Azure Data Factory
1.7KViews
0likes
0Comments
Dataflow Source Settings error that doesn't replicate when I test connection
Hope someone can help. I get the following error when run in a dataflow, however when I run the test connection it works fine. Can someone tell me why a connection will work as part of a test but not in a dataflow? I've obviously made a mistake somewhere but I don't know where. Error code: InvalidTemplate Failure type: User configuration issue Details: The template function 'dataset' is not defined or not valid Source: Pipeline: codes_Output_Concat Data flow: CodesDataFlow_Concatinate Monitor: Data flow activity codes_Output_Concat What is the dataflow suppose to do? Source is suppose to allow a user to select a file using a parameter (the name of the file) The parameter has been called TransferFile. the select section then takes the CSV's headers and removes some invalid charicters before moving onto the derived columns that uses byName function to pick out the columns I want to keep and cleans them up a bit. The Sink is then suppose to create a sql table to put the cleaned data into. Here are the parameters I've added to the dataflow. The dataflow does work when I add specific datasets with the names of the source data but not when I use the parameters for this. The parameters work well on the other copy data pipeline I created for the first part of this process. I've left that out of this altogether as I didn't want to confuse things. Thanks for reading Hope someone can point me in the right direction John.
John Dorrian
Aug 19, 2021 Place Azure Data Factory
1.4KViews
0likes
0Comments
Incrementally Load data from Azure SQL DB to ADLS Gen2 without WaterMarking with Partitioning
I want to incrementally load data from Azure SQL Database to Azure ADLS Gen2 without watermarking i.e., I do not have any control on Azure SQL Database apart from using the linked service and table Also the folder structure in ADLS is YYYY/MM which comes from one of the column of Azure SQL Database. Added two sources(SQL, ADLS) to DataFlows, created a derived Column Transformation to split YYYYMM column in SQL to YYYY and MM. Now I want to take this YYYY and MM columns into ADLS source and search in respective YYYY, MM folders. How do I achieve it . Below is the screenshot Tzvia or anyone else
Krishnamohan1837
Apr 07, 2022 Place Azure Data Factory
1.3KViews
0likes
1Comment