ADF Adds Hierarchical & JSON Data Transformations to Mapping Data Flows
Published Sep 28 2019 01:58 AM 23.9K Views
Microsoft

The Azure Data Factory team has released JSON and hierarchical data transformations to Mapping Data Flows. With this new feature, you can now ingest, transform, generate schemas, build hierarchies, and sink complex data types using JSON in data flows.

json5.png

In the sample data flow above, I take the Movies text file in CSV format, generate a new complex type called "Movies" that contains each of the attributes of the incoming CSV file.

json6.png

By using the Expression Builder (see above) inside of a Derived Column transform, I can define a new hierarchical structure that includes arrays. Genres is a multi-value field from the source CSV that contains 1:n genre categories for movies via a pipe (|) delimited field. I can use the ADF data flow expression language to split all values from that string and store each value as an array element.

json4.png

Mapping Data Flows now has the ability to show metadata for complex types including hierarchies and arrays in both metadata Inspect views as well as in Data Preview.

json2.pngjson1.png

Now that I've transformed my flat file into a hierarchical JSON format, I can sink this as a new JSON file to my Blob or lake using a JSON Sink. To create this new file, I'll map just the new complex types by selecting the top-level "Movies" hierarchy.

json7.png

JSON datasets can also now become source files for Data Flows. In this case, I'll read in my new JSON dataset from the above example and perform some transformations. I want to upper case each of the genres. To do this, I can apply a single, simple expression in a Derived Column, that will upper case each element in my Genres array that I created previously.

json3.png

We've posted a helper document on our ADF documentation site to help you work through more examples of working with hierarchies, arrays, and JSON datasets in ADF Mapping Data Flows here.

5 Comments
Copper Contributor

Super handy feature to avoid doing extra steps to deal with this kind of situations!! Thanks for adding this one.

Copper Contributor

Can you help us transform the following sample JSON to csv using ADF and/or Data Flows. 
{
"latest": "2019-10-10T11:09:09+00:00",
"employees": {
"5487": {
"id": "5487",
"action": "Updated",
"lastChanged": "2019-10-10T11:08:53+00:00"
},
"5491": {
"id": "5491",
"action": "Updated",
"lastChanged": "2019-10-10T11:09:01+00:00"
},
"5525": {
"id": "5525",
"action": "Updated",
"lastChanged": "2019-10-10T11:09:09+00:00"
}
}
}

Microsoft

In the Copy Activity, you can map these structures to targeted column names in your CSV flat file in the Sink mapping.

 

In Data Flows, you will create a Derived Column and build a flat structure of columns that map to the different structures.

 

Data Flows are enhancing the H2R capabilities in the tool, but for now, this is the mechanism to use.

Copper Contributor

Hi Mark, I have nested JSON File which contains data in array format also. While mapping I am getting error JSON Path inavalid at $[ResourceManagementProcess\\'sRunId]  due to escape character. I am using copy activity. Is there any other way to convert JSON to CSV format as data flow is not option to choose.

 

{
    "id": "ed0e4960-d9c5-11e6-85dc-d7996816aad3",
    "context": {
        "device": {
            "type": "PC"
        },
        "custom": {
            "dimensions": [
                {
                    "TargetResourceType": "Microsoft.Compute/virtualMachines"
                },
                {
                    "ResourceManagementProcess'sRunId": "827f8aaa-ab72-437c-ba48-d8917a7336a3"
                },
                {
                    "OccurrenceTime": "1/13/2017 11:24:37 AM"
                }
            ]
        }
    }
}

 

Microsoft

I am looking to transform json blob data to relational format using ADF as follows :
what is the way to do in ADF?
"reseller": {

                "en-ca": [

                    {

                        "MarketPlaceResellerId": "1",

                        "ResellerPrice": "",

                        "ResellerFormattedPrice": "",

                        "InventoryStatus": 3

                    }

                ],

                "en-us": [

                    {

                        "MarketPlaceResellerId": "2",

                        "ResellerPrice": "",

                        "ResellerFormattedPrice": "",

                        "InventoryStatus": 4

                    }

                ],

                "fr-ca": [

                    {

                        "MarketPlaceResellerId": "1",

                        "ResellerPrice": "",

                        "ResellerFormattedPrice": "",

                        "InventoryStatus": 3

                    }

Output :

                             

Market 

MarketPlaceResellerId 

ResellerPrice 

ResellerFormattedPrice 

InventoryStatus 

En-ca 

 

 

En-us 

 

 



Version history
Last update:
‎Sep 28 2019 01:58 AM
Updated by: