Introduction Azure Data Factory is good for data transformation, in this blog we will discuss how to convert CSV file into Json and explain about the aggregate activity.
Main Idea In ADF, a JSON is a complex data type, we want to build an array that consists of a JSONs. The idea is to create a DataFlow and add a key "Children" to the data, aggregate JSONs to build an array of JSONs using theaggregate activity. We will use a dummy value (constant 1) and by this dummy value we will do the grouping to build the array.
we will require:
A basic knowledge on ADF including how to create a new pipeline and add activities/ dataflows to a pipeline etc.
Source: Blob storage account, Load the CSV data and select first row as a header.
Map Drifted Columns: That will give us the ability to perform actions on the columns.
Derived Columns: Here, we are adding the dummy column with a constant value of 1, and a children column that will hold the array of JSONs later on. To build the Children Column, under Expressions -> Expression Builder -> click on children -> add 4 sub columns named key1, key2, key3, key4.
Click on each key and pass the column as an input (expression) to this key (see the below snip)
Click on save and finish.
Aggregate By Dummy: In this activity we will group the data by the dummy column that we added and collect all values under children, that will help us to build the array of JSONs instead of JSON of JSONs. Click on the activity -> group by dummy -> aggregates -> children -> collect(children)
Drop Dummy Column: Select only children array.
Sink: Blob storage account, we will write to sink.