Introduction
Azure Data Factory is good for data transformation, in this blog we will discuss how to convert CSV file into Json and explain about the aggregate activity.
Main Idea
In ADF, a JSON is a complex data type, we want to build an array that consists of a JSONs.
The idea is to create a DataFlow and add a key "Children" to the data, aggregate JSONs to build an array of JSONs using the aggregate activity.
We will use a dummy value (constant 1) and by this dummy value we will do the grouping to build the array.
Pre-requisites
we will require:
- A basic knowledge on ADF including how to create a new pipeline and add activities/ dataflows to a pipeline etc.
- Knowing How to save data to blob storage
Prepare your data:
Input CSV file:
Expected Output:
{children: [
{"key1":"a1", "key2":"b1", "key3":"c1", "key4":"d1"},
{"key1":"a2", "key2":"b2", "key3":"c2", "key4":"d2"},
...
]}
services
we will need:
- Blob storage account:
in the blob storage account, we will save our input data csv file (metadata.csv) - ADF account
add the file metadata.csv as a dataset in ADF.
links:
Datasets - Azure Data Factory & Azure Synapse | Microsoft Docs
Azure Blob Storage documentation | Microsoft Docs
ADF DataFlow:
The settings for the activities in the dataflow:
- Source:
Blob storage account, Load the CSV data and select first row as a header. - Map Drifted Columns:
That will give us the ability to perform actions on the columns. - Derived Columns:
Here, we are adding the dummy column with a constant value of 1, and a children column that will hold the array of JSONs later on.
To build the Children Column, under Expressions -> Expression Builder -> click on children -> add 4 sub columns named key1, key2, key3, key4.
Click on each key and pass the column as an input (expression) to this key (see the below snip)
Click on save and finish.
- Aggregate By Dummy:
In this activity we will group the data by the dummy column that we added and collect all values under children, that will help us to build the array of JSONs instead of JSON of JSONs.
Click on the activity -> group by dummy -> aggregates -> children -> collect(children) - Drop Dummy Column:
Select only children array. - Sink:
Blob storage account, we will write to sink.
Output:
Updated Aug 15, 2022
Version 1.0Sally_Dabbah
Microsoft
Joined July 10, 2022
FastTrack for Azure
Follow this blog board to get notified when there's new activity