Azure Data Factory Enables Data Wrangling at Scale with Power Query

Mark Kromer · ‎Jan 21 2021

The Azure Data Factory team is excited to announce a new update to the ADF data wrangling feature, currently in public preview. Wrangling in ADF empowers users to build code-free data prep and wrangling at cloud scale using the familiar Power Query data-first interface, natively embedded into ADF. Power Query provides a visual interface for data preparation and is used across many products and services. With Power Query embedded in ADF, you can use the PQ editor to explore and profile data as well as turn your M queries into scaled-out data prep pipeline activities. Data Flows in ADF and Synapse Analytics will now focus on Mapping Data Flows with a logic-first design paradigm, while the Power Query interface will enable the data-first wrangling scenario.

With Power Query in ADF, you now have a powerful tool to use in your ADF ETL projects for data profiling, data prep, and data wrangling. You have immediate feedback from introspection of your Lake and database data with the Power Query M language available for your data exploration. You can then take your resulting mash-up and save it as a first-class ADF object and orchestrate a data pipeline with that same M Power Query executing on Spark.

When you have completed your data exploration, save your work as a Power Query object and then add it as a Power Query activity on the ADF pipeline canvas. With your Power Query activity inside of a pipeline, ADF will execute your M query on Spark so that your activity will automatically scale with your data by leveraging the ADF data flow infrastructure.

In the example above, I added my Power Query activity to my pipeline for cleaning addresses from my ingested Lake data folders with Power Query, then handing the results off to a Data Flow via ADLS Gen2, where I perform data deduplication and then use the ADF pipeline to send emails when the process completes.

Because you are in the context of an ADF pipeline, you can define destination sinks for your Power Query mash-up so that you can persist the results of your transformations to data store like ADLS Gen2 storage or Synapse Analytics SQL Pools. Leverage the power of ADF to define source and destination mappings, database table settings, file and folder options, and other important data pipeline properties that data engineers need when building scalable data pipelines in ADF.

Click here to learn more about Azure Data Factory and the power of data wrangling at cloud scale with the new updated Power Query public preview feature in ADF.

timrobertsusa · ‎Jan 24 2021

Nice!!

What is it using under the hood?

Jason Kohlhoff · ‎Jan 29 2021

Power Query the World! :beaming_face_with_smiling_eyes:

Sunil_Sabat · ‎Feb 01 2021

@timrobertsusa Spark computing is used under hood to execute PowerQuery expressions in a managed cluster. That gives you performance and scalability.

Nived1002V · ‎Feb 02 2021

I tried it's an awesome feature

wstrasser · ‎Mar 08 2021

Hi,
One question - Power QUery is only available in ADF at the moment, right? I cannot find it in Synapse Pipelines..

Mark Kromer · ‎Mar 08 2021

@wstrasser Correct. This feature has not yet been ported to Synapse.

wstrasser · ‎Mar 08 2021

Thank you @Mark Kromer !

Paul_Elisii · ‎Aug 13 2021

This is a great feature @Mark Kromer - I tried creating a power query that uses Table.FuzzyNestedJoin but getting an error when I publish of the The Power Query Spark Runtime not supporting that function. Is that something that is in development?

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs

Most Active Hubs

Video Hub

Azure Data Factory Enables Data Wrangling at Scale with Power Query