ADF Data Flow Adds Support For No Quote Character in Data Flows (Coming Soon)

Published Sep 06 2020 09:25 PM 3,931 Views
Microsoft

Azure Data Factory will soon add support to Data Flows for "no quote character" in DelimitedText datasets for both source and sink. QuoteChar is the single character to quote column values if the column data contains the column delimiter. The typical scenario is double quotes (“). But with no quote char, it means there is no quote char present and the column value is not quoted, while escapeChar is used to escape the column delimiter and itself.

 

You can find and set the property in the Delimited Text dataset type. 

 

*** Potential Breaking Change ***

If you currently are using the Delimited Text dataset in a dataflow that has quote char to “No quote char” and escape char set to any character except “No escape char”, this may result in unexpected behavior, depending upon your actual data, as described below.

 

If the dataset is used in dataflow source, the breaking change will occur only if your column value contains *multiple consecutive escape characters* (default: \); or it contains an escape character following with a column delimiter (default: ,). 

 

E.g.  

  1.  If original data has column values with a double escape character, like ab\\cd, the previous behavior will treat it as ab\\cd. But the new behavior will treat it as ab\cd. Since \\ are translated as escape ‘escape char’ itself. 
  2. If original data has value with an escape character following with a column delimiter like ab\,cd, the previous behavior will treat it as ab\ and cd, two columns. But the new behavior will treat it as ab,cd, one column. Since \, are translated as an escape column delimiter. 

 

If the dataset is used in a dataflow sink, the new behavior will occur if your column value contains column delimiters (default: ,)or it contains escape character (default: \)

 

E.g. 

  1.  If the original data has column values with a column delimiter like "ab,cd," the previous behavior will output "ab,cd". But the new behavior will output "ab\,cd". This is because the column delimiter needs to be escaped. 
  2.  If original data has column values with an escape character like abc\d, the previous behavior will output abc\d. But the new behavior will output abc\\d. Since escape character needs to be escaped. 

Mitigation 

  • If your pipeline doesn’t face these conditions mentioned above, you can just keep your pipeline as is. 
  • If your pipeline has breaking changes mentioned above, you can mitigate this by setting the quote char to a character that would not be part of your data values.
10 Comments
Version history
Last update:
‎Sep 06 2020 09:53 PM
Updated by: