I have below scenarios as part of my business requirement. These requirements has to achieved dyanmically using Azure Data Factory Data Flows or Pipelines.
Note : Requirement is not to using Function Apps, DataBricks or any other API calls.
I have a blob storage which holds the CSV files with varying headers(I mean the headers and content inside it will change all the time) in it all the time. I want to move these CSV files to Parquet file by performing couple of validations, which are as mentioned below.
Need to loop through each file from source blob folder.
Need to get the count of rows inside the file dynamically.
(can't use pipeline lookup as the data inside it is in millions).
Use the count as conditional logic to continue to next step.
In next step i need to validate the CSV data to find any invalid rows. For Example, i'm using the comma(,) as column delimitor in my dataset. So if any string which is not enclosed in double-quotes("") and has a comma(,) with in it, will be treated as new column without any header column name. These type of column names should be treated as invalid rows and should be moved to another blob storage folder as a ".CSV"
For example the source CSV file may look like this.