Blog Post

Azure Data Factory Blog
2 MIN READ

ADF Adds Hierarchical & JSON Data Transformations to Mapping Data Flows

Mark Kromer's avatar
Mark Kromer
Icon for Microsoft rankMicrosoft
Sep 28, 2019

The Azure Data Factory team has released JSON and hierarchical data transformations to Mapping Data Flows. With this new feature, you can now ingest, transform, generate schemas, build hierarchies, and sink complex data types using JSON in data flows.

In the sample data flow above, I take the Movies text file in CSV format, generate a new complex type called "Movies" that contains each of the attributes of the incoming CSV file.

By using the Expression Builder (see above) inside of a Derived Column transform, I can define a new hierarchical structure that includes arrays. Genres is a multi-value field from the source CSV that contains 1:n genre categories for movies via a pipe (|) delimited field. I can use the ADF data flow expression language to split all values from that string and store each value as an array element.

Mapping Data Flows now has the ability to show metadata for complex types including hierarchies and arrays in both metadata Inspect views as well as in Data Preview.

Now that I've transformed my flat file into a hierarchical JSON format, I can sink this as a new JSON file to my Blob or lake using a JSON Sink. To create this new file, I'll map just the new complex types by selecting the top-level "Movies" hierarchy.

JSON datasets can also now become source files for Data Flows. In this case, I'll read in my new JSON dataset from the above example and perform some transformations. I want to upper case each of the genres. To do this, I can apply a single, simple expression in a Derived Column, that will upper case each element in my Genres array that I created previously.

We've posted a helper document on our ADF documentation site to help you work through more examples of working with hierarchies, arrays, and JSON datasets in ADF Mapping Data Flows here.

Published Sep 28, 2019
Version 1.0

5 Comments

  • I am looking to transform json blob data to relational format using ADF as follows :
    what is the way to do in ADF?
    "reseller": {

                    "en-ca": [

                        {

                            "MarketPlaceResellerId": "1",

                            "ResellerPrice": "",

                            "ResellerFormattedPrice": "",

                            "InventoryStatus": 3

                        }

                    ],

                    "en-us": [

                        {

                            "MarketPlaceResellerId": "2",

                            "ResellerPrice": "",

                            "ResellerFormattedPrice": "",

                            "InventoryStatus": 4

                        }

                    ],

                    "fr-ca": [

                        {

                            "MarketPlaceResellerId": "1",

                            "ResellerPrice": "",

                            "ResellerFormattedPrice": "",

                            "InventoryStatus": 3

                        }

    Output :

                                 

    Market 

    MarketPlaceResellerId 

    ResellerPrice 

    ResellerFormattedPrice 

    InventoryStatus 

    En-ca 

     

     

    En-us 

     

     



  • RahulSangwan's avatar
    RahulSangwan
    Copper Contributor

    Hi Mark, I have nested JSON File which contains data in array format also. While mapping I am getting error JSON Path inavalid at $[ResourceManagementProcess\\'sRunId]  due to escape character. I am using copy activity. Is there any other way to convert JSON to CSV format as data flow is not option to choose.

     

    {
        "id": "ed0e4960-d9c5-11e6-85dc-d7996816aad3",
        "context": {
            "device": {
                "type": "PC"
            },
            "custom": {
                "dimensions": [
                    {
                        "TargetResourceType": "Microsoft.Compute/virtualMachines"
                    },
                    {
                        "ResourceManagementProcess'sRunId": "827f8aaa-ab72-437c-ba48-d8917a7336a3"
                    },
                    {
                        "OccurrenceTime": "1/13/2017 11:24:37 AM"
                    }
                ]
            }
        }
    }

     

  • In the Copy Activity, you can map these structures to targeted column names in your CSV flat file in the Sink mapping.

     

    In Data Flows, you will create a Derived Column and build a flat structure of columns that map to the different structures.

     

    Data Flows are enhancing the H2R capabilities in the tool, but for now, this is the mechanism to use.

  • Mukesh_Dutta's avatar
    Mukesh_Dutta
    Copper Contributor

    Can you help us transform the following sample JSON to csv using ADF and/or Data Flows. 
    {
    "latest": "2019-10-10T11:09:09+00:00",
    "employees": {
    "5487": {
    "id": "5487",
    "action": "Updated",
    "lastChanged": "2019-10-10T11:08:53+00:00"
    },
    "5491": {
    "id": "5491",
    "action": "Updated",
    "lastChanged": "2019-10-10T11:09:01+00:00"
    },
    "5525": {
    "id": "5525",
    "action": "Updated",
    "lastChanged": "2019-10-10T11:09:09+00:00"
    }
    }
    }

  • MartinZurita's avatar
    MartinZurita
    Copper Contributor

    Super handy feature to avoid doing extra steps to deal with this kind of situations!! Thanks for adding this one.