Azure Data Factory Enables Data Wrangling at Scale with Power Query
Published Jan 21 2021 02:52 PM 12.7K Views
Microsoft

The Azure Data Factory team is excited to announce a new update to the ADF data wrangling feature, currently in public preview. Wrangling in ADF empowers users to build code-free data prep and wrangling at cloud scale using the familiar Power Query data-first interface, natively embedded into ADF. Power Query provides a visual interface for data preparation and is used across many products and services. With Power Query embedded in ADF, you can use the PQ editor to explore and profile data as well as turn your M queries into scaled-out data prep pipeline activities. Data Flows in ADF and Synapse Analytics will now focus on Mapping Data Flows with a logic-first design paradigm, while the Power Query interface will enable the data-first wrangling scenario.

 

 

With Power Query in ADF, you now have a powerful tool to use in your ADF ETL projects for data profiling, data prep, and data wrangling. You have immediate feedback from introspection of your Lake and database data with the Power Query M language available for your data exploration. You can then take your resulting mash-up and save it as a first-class ADF object and orchestrate a data pipeline with that same M Power Query executing on Spark.

 

power-query-activity.png

 

When you have completed your data exploration, save your work as a Power Query object and then add it as a Power Query activity on the ADF pipeline canvas. With your Power Query activity inside of a pipeline, ADF will execute your M query on Spark so that your activity will automatically scale with your data by leveraging the ADF data flow infrastructure.

 

power-query-pipeline.png

 

In the example above, I added my Power Query activity to my pipeline for cleaning addresses from my ingested Lake data folders with Power Query, then handing the results off to a Data Flow via ADLS Gen2, where I perform data deduplication and then use the ADF pipeline to send emails when the process completes.

 

Because you are in the context of an ADF pipeline, you can define destination sinks for your Power Query mash-up so that you can persist the results of your transformations to data store like ADLS Gen2 storage or Synapse Analytics SQL Pools. Leverage the power of ADF to define source and destination mappings, database table settings, file and folder options, and other important data pipeline properties that data engineers need when building scalable data pipelines in ADF.

 

Click here to learn more about Azure Data Factory and the power of data wrangling at cloud scale with the new updated Power Query public preview feature in ADF.

 

8 Comments
Copper Contributor

Nice!!

What is it using under the hood?

 

Brass Contributor

Power Query the World! :beaming_face_with_smiling_eyes:

Microsoft

@timrobertsusa Spark computing is used under hood to execute PowerQuery  expressions in a managed cluster. That gives you performance and scalability. 

Copper Contributor
I tried it's an awesome feature

Hi,
One question - Power QUery is only available in ADF at the moment, right? I cannot find it in Synapse Pipelines.. 

Microsoft

@wstrasser Correct. This feature has not yet been ported to Synapse. 

Thank you @Mark Kromer !

Copper Contributor

This is a great feature @Mark Kromer - I tried creating a power query that uses Table.FuzzyNestedJoin but getting an error when I publish of the The Power Query Spark Runtime not supporting that function.  Is that something that is in development?

Version history
Last update:
‎Jan 24 2021 12:49 AM
Updated by: