Forum Discussion
AjayGopu
May 10, 2024Copper Contributor
How to Achieve Coupe of functionalities in Azure Data Factory Dyanmically.
Hi Team. I have below scenarios as part of my business requirement. These requirements has to achieved dyanmically using Azure Data Factory Data Flows or Pipelines. Note : Requirement is not...
Kidd_Ip
Feb 04, 2025MVP
How about Azure Data Factory?
- Create a Pipeline:
- Create a new pipeline in Azure Data Factory.
- Add Activities to Pipeline:
- Get Metadata Activity: Use this activity to list all the files in the source blob folder.
- ForEach Activity: Loop through each file using the output of the Get Metadata activity.
- Inside the ForEach Activity:
- Copy Activity: Copy the file content from the source blob to a staging area in Azure Data Lake Storage Gen2.
- Data Flow Activity: Add a Data Flow activity to process the CSV files and perform validations.
- Create a Data Flow:
- Source Transformation: Add a source transformation to read the CSV file from the staging area.
- Derived Column Transformation: Add a derived column transformation to add a row count column.
- Conditional Split Transformation: Add a conditional split transformation to separate valid and invalid rows.
- Valid Rows: Rows that meet the criteria.
- Invalid Rows: Rows that contain invalid data.
- Sink Transformation: Add two sink transformations:
- Valid Rows Sink: Write the valid rows to a Parquet file in the destination folder.
- Invalid Rows Sink: Write the invalid rows to a separate CSV file in another blob storage folder.
Example on Pipeline and Data Flow:
Pipeline
- Get Metadata Activity:
- Configure the activity to list all the files in the source blob folder.
- Output: List of file names.
- ForEach Activity:
- Items: @activity('Get Metadata Activity').output.childItems
- Inside the ForEach activity, add the following activities:
- Copy Activity:
- Source: Source blob storage.
- Sink: Staging area in Azure Data Lake Storage Gen2.
- Data Flow Activity:
- Parameters: Pass the file name to the data flow.
- Copy Activity:
Data Flow
- Source Transformation:
- Source: Staging area in Azure Data Lake Storage Gen2.
- Options: Enable header and specify delimiter as comma.
- Derived Column Transformation:
- Add a column for row count: RowCount = rownum()
- Conditional Split Transformation:
- Valid Rows Condition: length(toString(TestColumn1)) > 0 and length(toString(TestColumn2)) > 0 and length(toString(TestColumn3)) > 0
- Invalid Rows Condition: !(length(toString(TestColumn1)) > 0 and length(toString(TestColumn2)) > 0 and length(toString(TestColumn3)) > 0)
- Sink Transformation:
- Valid Rows Sink: Write to Parquet file.
- Invalid Rows Sink: Write to CSV file in another blob storage folder.