Blog Post

Azure Data Factory Blog
2 MIN READ

ADF Data Flows Script Intro: Copy, Paste, Snippets

Mark Kromer's avatar
Mark Kromer
Icon for Microsoft rankMicrosoft
Mar 26, 2020

ADF Data Flows has a low-code graph-based UI to design and develop data transformations at scale. But the script behind those graphs is very powerful. This article introduces you to the Data Flow Script and how to use it to copy, paste, and share data transformation code.

 

 

Below is a sample data flow where I have a fairly complex Aggregate transformation that I wish to use in another data flow. To do that, I'm going to click on the Script button the ADF Data Flow design UI to view the script behind.

 

 

 

Details of Data Flow Script are here at the ADF doc page.

 

Each data flow starts with a source transformation. Each subsequent transformation in the script will start with the name of the incoming stream. We want to share the aggregate called SummaryStats. To do that, click Script, then Ctrl-F to enter into Find/Replace. Search for "SummaryStats". To share this transformation, we copy from the aggregate() function to the end of the expression, just before the transformation name. The name of the transformation uses the syntax ~> and then the name of transformation. When I share this, I do not copy the transformation name, assuming that I can set a new name when I paste it.

 

Here is the code that gets copied:

 

 

aggregate(each(match(true()), $$+'_NotNull' = countIf(!isNull($$)), $$ + '_Null' = countIf(isNull($$))),each(match(type=='double'||type=='integer'||type=='short'||type=='decimal'), $$+'_stddev' = round(stddev($$),2), $$ + '_min' = min ($$), $$ + '_max' = max($$), $$ + '_average' = round(avg($$),2), $$ + '_variance' = round(variance($$),2)), each(match(type=='string'), $$+'_maxLength' = max(length($$))))

 

This includes all of my expression function language work, so I can share that code with other data flows. I could also use the script-behind to rename my transformations using find/replace.

 

 

I've started including shareable data flow script snippets in the script docs here. You can copy those snippet blocks and paste them into your data flows. Watch the video above toward the end for an example of how to use the code snippets.

 

 

 

Updated Mar 26, 2020
Version 2.0
  • FelixVita's avatar
    FelixVita
    Copper Contributor

    Mark Kromer that's great news! I assume this multi-line dataflow script will be formatted as JSON then.

     

    Am I allowed to ask which quarter? CY22 could mean anything from Q1 to Q4.

  • We will be converting the single line script into multiple lines soon in an updated release in CY22

  • FelixVita's avatar
    FelixVita
    Copper Contributor

    Hi there Mark,

     

    Having the data flow script (DFS) as a single line makes version control difficult.

     

    For example if I change 3 of 400 lines in a DFS, this will be identified as only 1 change in GitHub, since all those 400 lines are on a single line in the dataflow json file.

     

     

    And sometimes GitHub will say the DFS changed even though it functionally did not. This happens because the formatting of the DFS (or the order of the parameters therein) can occasionally change, even if none of the actual values changed.

     

    I've tried writing a python script to automatically parse the DFS of any dataflow into json format. I aim to use my script to automatically go through any number dataflow files and obtain proper "diffs" between those and a corresponding set of earlier/different dataflow files. But this has proved easier said than done, mainly since the DFS appears to be written in some non-standard data format which I am unable to identify or parse easily. My script is currently a clunky and error-prone mess of regexes, loops, and conditions. Is there a better way?

     

    Markus