Blog Post

Azure Data Factory Blog
1 MIN READ

Read and Write Complex Data Types in ADF

Mark Kromer's avatar
Mark Kromer
Icon for Microsoft rankMicrosoft
Oct 12, 2020

ADF has connectors for Parquet, Avro, and ORC data lake file formats. However, datasets used by Copy Activity do not currently have support for those types. Here is how to read and write those complex columns in ADF by using data flows.

 

There is a description of this technique in each file format documentation page in the ADF online docs:

 

https://docs.microsoft.com/en-us/azure/data-factory/format-orc#dataset-properties

https://docs.microsoft.com/en-us/azure/data-factory/format-parquet#data-type-support

https://docs.microsoft.com/en-us/azure/data-factory/format-avro#data-flows 

 

Step 1: Make a new dataset and choose the file format type. In this example, I am using Parquet. Set NONE for schema:

Step 2: Make a data flow with this new dataset as the source:

Step 3: Go to Projection -> Import Projection

Step 4: You’ll see your data under Data Preview

Updated Oct 12, 2020
Version 2.0

1 Comment

  • preetijaiswal's avatar
    preetijaiswal
    Copper Contributor

    Hi Mark Kromer ,

    Your Articles are Amazing and really helpful!

    I have been trying to dynamically upload multiple csv files from onprem to sql server hosted in aws. Please advice me if you have a way how to do that.

    I did this by using self IR for file server and used getmetadata>foreach>>copyactivity . So successfully all csv files are migrated to sql server but with one problem while mapping for csv it took all the datatypes as string and in sql server all the table columns are nvarchar (max) which are different because when using SSIS import option to manually import csv i have all column having different datatypes like date, int, bitint etc. 

    For my requirement i need to do this task as much dynamic as possible with correct datatypes and using ADF only. So would you please suggest me any other way or correct me where i am wrong. Also someone suggested me to invoke python file using adf having the script to get metadata of table from csv and then passing that to pipeline and then creating table. Also due to cost issues i cant use data bricks.

    I don't know how to do this as i am newbie.Please help me out by throwing some light.

    Thanks