Mapping Data Flows
91 TopicsAlter Row Ignoring its Conditions
Hello. I have an ADF Dataflow which has two sources, a blob container with JSON files and an Azure SQL table. The sink is the same SQL table as the SQL source, the idea being to conditionally insert new rows, update rows with a later modified date in the JSON source or do nothing if the ID exists in the SQL table with the same modified date. In the Dataflow I join the rows on id, which is unique in both sources, and then use an Alter row action to insert if the id column from the SQL source is null, update if it's not null but the last updated timestamp in the JSON source is newer, or delete if the last updated timestamp in the JSON source is the same or older (delete is not permitted in the sink settings so that should ignore/do nothing). The problem I'm having is I get a primary key violation error when running the Dataflow as it's trying to insert rows that already exist: For example in my run history (160806 is the minimum value for ID in the SQL database): So for troubleshooting I put a filter directly after each source for that ticket ID so when I'm debugging I only see that single row. Now here is the configuration of my Alter row action: It should insert only if the SQLTickets id column is null, but here in the data preview from the same Alter rows action. It's marked as an insert, despite the id column from both sources clearly having a value: However, when I do a data preview in the expression builder itself, it correctly evaluates to false: I'm so confused. I've used this technique in other Dataflows without any issues so I really have no idea what's going on here. I've been troubleshooting it for days without any result. I've even tried putting a filter after the Alter row action to explicitly filter out rows where the SQL id column is not null and the timestamps are the same. The data preview shows them filtered out but yet it still tries to insert the rows it should be ignoring or updating anyway when I do a test run. What am I doing wrong here?37Views0likes0CommentsJson data structure reconstruction from array.
Hi Everyone, I am working on an Azure Cosmos DB migration where I need to transform the structure of documents in the "Notes" container. While I have identified most of the migration steps, I am currently facing a challenge with structural conversion and parsing of a JSON array field inside Azure Data Factory (ADF) Data Flow. Problem Context: I need to convert the "NoteDetails" array field from the original document format to a new normalized structure. The transformation involves flattening the array and restructuring it into a key-value format within a new field called "NormalizedNoteDetails". Original Document (Current Cosmos DB Structure): { "id": "123", "ownerId" : "1234", "noteCategory" : "somecategory", "NoteDetails": [ { "Key": "PropertyNameKey1", "Value": ["prop1of1", "prop2of1"] }, { "Key": "PropertyNameKey2", "Value": ["prop1of2", "prop2of2"] }] } Expected Transformed Structure (Target Format in Cosmos DB): { "id": "123", "ownerId" : "1234", "noteCategory" : "somecategory", "normalizedNoteDetails": { "PropertyNameKey1": ["prop1of1", "prop2of1"], "PropertyNameKey2": ["prop1of2", "prop2of2"] } } Specific Question: How can I efficiently parse and transform the "Notes" JSON array field inside an ADF Data Flow activity to achieve the new structure? I have tried flatten and pivot with no success but I am newbie to the ADF. A concrete example would help me tremendously. Thanks.73Views0likes0CommentsHow do I unpivot with schema drift enabled?
I have a source without a pre-defined schema. I derive each column name using a column pattern expression: Data preview shows what I expect (which is a file in a blob container): I then have a Select step that selects each column and renames 'Marker name' to 'Marker_name22': Data preview again shows what I expect (same columns with 'Marker name' renamed). Now in the unpivot step, I would like to ungroup by the 'Marker_name22' column and unpivot all other columns, but the 'Marker_name22' column is not available: I am unsure how to proceed from here. Thanks in advance for the help.54Views0likes1CommentAction Required: Switch from Memory Optimized Data Flows in Azure Data Factory to General Purpose
Azure Data Factory Memory Optimized Data Flows will be fully retired on April 1, 2027. Going forward, all ADF Data Flows will use the General Purpose SKU that will provide performance that is superior to the current Memory Optimized and at the General-Purpose price.3.5KViews2likes1CommentComplex JSON from a REST Api with Dataflow
Hi. I have a Rest API that retreives a complex json. In order to flatten that. Do i have to store de json in a file first? or can i flateen that json from the Rest API directly? Do you know if is it an examle? (i found videos of flatten complex json from a json file, but not directly from a Rest API) I get this error trying test connection This api works properly in a copy_data task Thank you188Views0likes0CommentsI can't delete the source in dataflow
After adding a new datasource to dataflow, I couldn't delete it later. Nothing happened after trying to delete. After I published, I saw that my data flow had disconnected in the "incoming stream". I found it via 'search' and linked back, but I still couldn't delete the new source, although I could delete other sources. Source is the same like a lots of other sources. If I select "Prieview data" in the dataset, I see the data and everything is OK. In dataflow, I see columns from new source that I can combine and make a select, but I cannot delete new source. Every time I publish after making other changes, my dataflow "breaks" and in the end it allowed me to publish but did not save any transform stages. only source and sink remained. Theoretically, Azur will not allow you to publish dataflow if there is no source and sink connected, but it did. Where and how should I look for the problem? Has anyone encountered such a bug? As you can see, the flow is already published (i.e. saved as you can see, which is impossible for Azure to allow you to publish without a connection) Yellow Color - There is no indication that it is being edited, so it is Published. Green color - Source which I cannot delete Blue color - Sink that is not connected to anything. Between them all the steps were gone.425Views0likes2CommentsAzure Data Factory Copy Data Activity changes data while copying Parquet data to Dedicated SQL Pool
Hi All, We have a parquet file on a ADLS2 Storage container, that has over 7 million rows of data. We created a Copy Data Activity on Azure Data Factory, to move this data to a table in Dedicated SQL Pool. All the data from the Parquet file goes into the database table accurately, except for this one row, where there is a Decimal value of 78.6 in the parquet file, that goes into the SQL table as 78.5. Here's more context on the steps we took so far to trace the root cause for this issue: We have tried to change the parquet file name and push it to this table -- the data still goes in to the SQL table as 78.5 (when the parquet file has 78.6) We have tried to create a version 2 table in the SQL DB and pushed the data into this V2 table using the Copy Data Activity, still, the data goes in as 78.5 We have checked the compression type used to create the parquet file on our python code (it is GZIP), the compression type used to unzip this parquet file data on Data Factory's Dataset connection -- earlier it was snappy, we changed it to GZIP and re-ran the Copy Data Activity -- and still, the data goes in as 78.5 We have checked the Decimal data type precision and scale, as well as the datatype Mapping from source to sink -- if this was off, the whole dataset should have issues for this column, but it is this one row, that goes in to the SQL table incorrectly. Ask: Has any of you ever encountered this issue before? If so, how did you solve it? Any suggestions are welcome. Thank you!!399Views0likes0CommentsHow to handle azure data factory lookup activity with more than 5000 records
Hello Experts, The DataFlow Activity successfully copies data from an Azure Blob Storage .csv file to Dataverse Table Storage. However, an error occurs when performing a Lookup on the Dataverse due to excessive data. This issue is in line with the documentation, which states that the Lookup activity has a limit of 5,000 rows and a maximum size of 4 MB. Also, there is a Workaround mentioned (Micrsofot Documentation): Design a two-level pipeline where the outer pipeline iterates over an inner pipeline, which retrieves data that doesn't exceed the maximum rows or size. How can I do this? Is there a way to define an offset (e.g. only read 1000 rows) Thanks, -Sri2.9KViews0likes1CommentHow to load data from On-prem to Snowflake using ADF in better way
Hi, My use case is as follows: Our data source is an On-prem SQL Server, and it serves as our production database. Currently, we are building reports in Power BI and utilizing Snowflake as our data warehouse. I aim to extract 10 to 15 tables for my Power BI reporting into Snowflake, specifically wanting to construct a SCD Type 1 Pipeline, without the need to retain historical data. To facilitate the data transfer from On-Prem to Snowflake, we are leveraging Azure Data Factory, with blob storage set up in Azure. We already have a Self-hosted runtime in place that connects to Data Factory. Currently, I've employed the For Each loop activity to copy 15 tables from On-prem to Snowflake. The pipeline is scheduled to run daily, and each time it executes, it truncates all 15 tables before loading the data. However, this process is time-consuming due to the volume of data, especially since our On-prem SQL server is a legacy database. My questions are as follows: Is there a more efficient approach within Data Factory for this task? What are the best practices recommended for this type of case study? Dataflow is not functioning with the Self-hosted runtime; how can I activate Dataflow for an On-prem database? I would greatly appreciate any advice on the correct solution for this or pointers to relevant documentation or blogs.1.7KViews0likes0Comments