Azure Data Factory Enables Data Wrangling at Scale with Power Query

Microsoft

Jan 21, 2021

The Azure Data Factory team is excited to announce a new update to the ADF data wrangling feature, currently in public preview. Wrangling in ADF empowers users to build code-free data prep and wrangling at cloud scale using the familiar Power Query data-first interface, natively embedded into ADF. Power Query provides a visual interface for data preparation and is used across many products and services. With Power Query embedded in ADF, you can use the PQ editor to explore and profile data as well as turn your M queries into scaled-out data prep pipeline activities. Data Flows in ADF and Synapse Analytics will now focus on Mapping Data Flows with a logic-first design paradigm, while the Power Query interface will enable the data-first wrangling scenario.

With Power Query in ADF, you now have a powerful tool to use in your ADF ETL projects for data profiling, data prep, and data wrangling. You have immediate feedback from introspection of your Lake and database data with the Power Query M language available for your data exploration. You can then take your resulting mash-up and save it as a first-class ADF object and orchestrate a data pipeline with that same M Power Query executing on Spark.

When you have completed your data exploration, save your work as a Power Query object and then add it as a Power Query activity on the ADF pipeline canvas. With your Power Query activity inside of a pipeline, ADF will execute your M query on Spark so that your activity will automatically scale with your data by leveraging the ADF data flow infrastructure.

In the example above, I added my Power Query activity to my pipeline for cleaning addresses from my ingested Lake data folders with Power Query, then handing the results off to a Data Flow via ADLS Gen2, where I perform data deduplication and then use the ADF pipeline to send emails when the process completes.

Because you are in the context of an ADF pipeline, you can define destination sinks for your Power Query mash-up so that you can persist the results of your transformations to data store like ADLS Gen2 storage or Synapse Analytics SQL Pools. Leverage the power of ADF to define source and destination mappings, database table settings, file and folder options, and other important data pipeline properties that data engineers need when building scalable data pipelines in ADF.

Click here to learn more about Azure Data Factory and the power of data wrangling at cloud scale with the new updated Power Query public preview feature in ADF.

Updated Jan 24, 2021

Version 2.0

Microsoft

Joined August 14, 2018

View Profile

Azure Data Factory Blog

Follow this blog board to get notified when there's new activity