ADF Data Flow Expressions: Tips and Tricks

Mark Kromer

Microsoft

Feb 07, 2020

Here are some tips and shortcuts that you can use inside of the expression builder in ADF's Mapping Data Flows:

Keyboard shortcuts

Ctrl-K Ctrl-C: Comments entire line
Ctrl-K Ctrl-U: Uncomment
F1: Provide editor help commands
Alt-Down Arrow: Move current line down
Alt-Up Arrow: Move current line up
Cntrl-Space: Show context help

Manual Comments


/* This is a
   multi-line comment */
   
// This is a single line comment

TIP: If you put a comment at the top of your expression, it will appear in the transformation text box to document your transformation expressions:

Convert Date to String format

toString(toDate('28/03/2010', 'dd/MM/yyyy'), 'ddMMMyyyy') = 28Mar2010

This list below is from the Java doc for date formats:

All letters 'A' to 'Z' and 'a' to 'z' are reserved as pattern letters. The following pattern letters are defined:

  Symbol  Meaning                     Presentation      Examples
  ------  -------                     ------------      -------
   G       era                         text              AD; Anno Domini; A
   u       year                        year              2004; 04
   y       year-of-era                 year              2004; 04
   D       day-of-year                 number            189
   M/L     month-of-year               number/text       7; 07; Jul; July; J
   d       day-of-month                number            10

   Q/q     quarter-of-year             number/text       3; 03; Q3; 3rd quarter
   Y       week-based-year             year              1996; 96
   w       week-of-week-based-year     number            27
   W       week-of-month               number            4
   E       day-of-week                 text              Tue; Tuesday; T
   e/c     localized day-of-week       number/text       2; 02; Tue; Tuesday; T
   F       week-of-month               number            3

   a       am-pm-of-day                text              PM
   h       clock-hour-of-am-pm (1-12)  number            12
   K       hour-of-am-pm (0-11)        number            0
   k       clock-hour-of-am-pm (1-24)  number            0

   H       hour-of-day (0-23)          number            0
   m       minute-of-hour              number            30
   s       second-of-minute            number            55
   S       fraction-of-second          fraction          978
   A       milli-of-day                number            1234
   n       nano-of-second              number            987654321
   N       nano-of-day                 number            1234000000

   V       time-zone ID                zone-id           America/Los_Angeles; Z; -08:30
   z       time-zone name              zone-name         Pacific Standard Time; PST
   O       localized zone-offset       offset-O          GMT+8; GMT+08:00; UTC-08:00;
   X       zone-offset 'Z' for zero    offset-X          Z; -08; -0830; -08:30; -083015; -08:30:15;
   x       zone-offset                 offset-x          +0000; -08; -0830; -08:30; -083015; -08:30:15;
   Z       zone-offset                 offset-Z          +0000; -0800; -08:00;

   p       pad next                    pad modifier      1

   '       escape for text             delimiter
   ''      single quote                literal           '
   [       optional section start
   ]       optional section end
   #       reserved for future use
   {       reserved for future use
   }       reserved for future use

Concat Strings Shortcut

'This is my string.' + ' This is my new string.' = This is my string. This is my new string

Regexp

https://kromerbigdata.com/2019/01/02/azure-data-factory-data-flow-transform-data-with-regular-expressions/

regexReplace(Address1,[ ]{2}|\.,' ')
regex_extract(Address1, ^(\d+), 1)
rlike(City,'^[A-G]')

Good for Alter Row:

true()

Use it in your alter row filter will allow all rows to match that condition. Good for Upsert. No need to use 1==1.

Or, if you want inequality (1==0):

false()

Alter Row will tag rows with each matching rule. Check for row tags based on Alter Row matching rules:

isUpdate(), isUpsert(), isDelete(), isInsert()

Use byName() to access "hidden fields"

When you are working in the ADF Data Flow UI, you can see the metadata as you construct your transformations. The metadata is based on the projection of the source plus the columns defined in transformations. However, in some instances, you do not get the metadata due to schema drift, column patterns, or dynamic transformations like Pivot that create column names on the fly. In that case, you byName():

toString(byName('mynewcol'))

Fuzzy matching

Soundex(columnname)

isNull / coalesce

isNull (col1, 'somevalue')

coalesce(expression)

Lookup Match / No match

After your Lookup transformation, you can use subsequent transformations to inspect the results of each matched row by using the expression function isMatch() to make further choices in your logic based on whether or not the Lookup resulted in a row match or not.

Regex to remove non-alphanumeric chars

regexReplace(mystring,`^a-zA-Z\d\s:`,'')

Convert to Timestamp

toString(toTimestamp('12/31/2016T00:12:00', 'MM/dd/yyyy\'T\'HH:mm:ss'), 'MM/dd /yyyy\'T\'HH:mm:ss')

Note that to include string literals in your timestamp output, you need to wrap your conversion inside of a toString().

Here is how to convert seconds from Epoch to a date or timestamp:

toTimestamp(seconds(1575250977))

How can I create a derived column that is a nullable timestamp, like C# DateTime or SSIS NULL(DT_DATE)?

DateReported2 = CASE WHEN DateReported is null THEN DateReported WHEN YEAR(DateReported) = 1899 THEN NULL ELSE DateReported End ...

Solution:

case(year(DateReported) != 1899, DateReported)

Row Counts

To get Row Counts in Data Flows, add an Aggregate transformation, leave the Group By empty, then use count(1) as your aggregate function.

Distinct Rows

To get distinct rows in your Data Flows, use the Aggregate transformation, set the key(s) to use for distinct in your group by, then choose First($$) or Last($$) as your aggregate function using column patterns.