How To Convert CSV File Into Array Of JSONs In ADF
Published Oct 02 2022 04:57 AM 7,301 Views
Microsoft

Introduction
Azure Data Factory is good for data transformation, in this blog we will discuss how to convert CSV file into Json and explain about the aggregate activity.

Main Idea
In ADF, a JSON is a complex data type, we want to build an array that consists of a JSONs.
The idea is to create a DataFlow and add a key "Children" to the data, aggregate JSONs to build an array of JSONs using the aggregate activity.
We will use a dummy value (constant 1) and by this dummy value we will do the grouping to build the array.

Pre-requisites

we will require:

  • A basic knowledge on ADF including how to create a new pipeline and add activities/ dataflows to a pipeline etc.
  •  Knowing How to save data to blob storage

 

Prepare your data:
Input CSV file:

Sally_Dabbah_0-1658050590147.png
Expected Output:

{children: [
{"key1":"a1", "key2":"b1", "key3":"c1", "key4":"d1"},
{"key1":"a2", "key2":"b2", "key3":"c2", "key4":"d2"},
...
]}

services

we will need:


ADF DataFlow:

Sally_Dabbah_4-1658051527623.png

 

 The settings for the activities in the dataflow:

  • Source:
    Blob storage account, Load the CSV data and select first row as a header.
  • Map Drifted Columns:
    That will give us the ability to perform actions on the columns. 
  • Derived Columns:
    Here, we are adding the dummy column with a constant value of 1, and a children column that will hold the array of JSONs later on. 
    To build the Children Column, under Expressions -> Expression Builder -> click on children -> add 4 sub columns named key1, key2, key3, key4. 
    Sally_Dabbah_1-1658051171923.png

    Click on each key and pass the column as an input (expression) to this key (see the below snip)

    Sally_Dabbah_2-1658051237554.png

    Click on save and finish.



  • Aggregate By Dummy:
    In this activity we will group the data by the dummy column that we added and collect all values under children, that will help us to build the array of JSONs instead of JSON of JSONs. 
    Click on the activity -> group by dummy -> aggregates -> children -> collect(children)
    Sally_Dabbah_3-1658051419025.png
  • Drop Dummy Column:
    Select only children array.
  • Sink:
    Blob storage account, we will write to sink.

    Output:

    Sally_Dabbah_5-1658051865894.png

     

Co-Authors
Version history
Last update:
‎Aug 15 2022 02:31 AM
Updated by: