XML added to Parse transformation in ADF and Synapse Data Flows

Published 05-10-2021 03:36 PM 1,528 Views
Microsoft

The Parse transformation in Azure Data Factory and Synapse Analytics data flows allows data engineers to write ETL data transformations that take embedded documents inside of string fields and parse them as their native types. For example, you can set parsing rules in the Parse transformation to handle JSON and delimited text strings and transform those fields into complex types. Now, we've updated Parse to also understand XML as a source type in your incoming string data.

 

In this example, I have a text delimited CSV data source. Since this is a simple text file, the embedded XML document in the column labeled "xml" is read as a string, so I am unable to treat it like a hierarchical structure. However, by adding the Parse transformation, I can select XML as my incoming embedded type and define the structure "customers" as the new column that is now hierarchical:

 

parsexml1.png

 

  • Source XML data: <Customers><Customer>122</Customer><CompanyName>Great Lakes Food Market</CompanyName></Customers>

    • Expression: (Customers as (Customer as integer, CompanyName as string))
%3CLINGO-SUB%20id%3D%22lingo-sub-2341848%22%20slang%3D%22en-US%22%3EXML%20added%20to%20Parse%20transformation%20in%20ADF%20and%20Synapse%20Data%20Flows%3C%2FLINGO-SUB%3E%3CLINGO-BODY%20id%3D%22lingo-body-2341848%22%20slang%3D%22en-US%22%3E%3CP%3EThe%20%3CA%20href%3D%22https%3A%2F%2Fdocs.microsoft.com%2Fen-us%2Fazure%2Fdata-factory%2Fdata-flow-parse%22%20target%3D%22_self%22%20rel%3D%22noopener%20noreferrer%22%3EParse%20transformation%3C%2FA%3E%26nbsp%3Bin%20Azure%20Data%20Factory%20and%20Synapse%20Analytics%20data%20flows%20allows%20data%20engineers%20to%20write%20ETL%20data%20transformations%20that%20take%20embedded%20documents%20inside%20of%20string%20fields%20and%20parse%20them%20as%20their%20native%20types.%20For%20example%2C%20you%20can%20set%20parsing%20rules%20in%20the%20Parse%20transformation%20to%20handle%20JSON%20and%20delimited%20text%20strings%20and%20transform%20those%20fields%20into%20complex%20types.%20Now%2C%20we've%20updated%20Parse%20to%20also%20understand%20XML%20as%20a%20source%20type%20in%20your%20incoming%20string%20data.%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3EIn%20this%20example%2C%20I%20have%20a%20text%20delimited%20CSV%20data%20source.%20Since%20this%20is%20a%20simple%20text%20file%2C%20the%20embedded%20XML%20document%20in%20the%20column%20labeled%20%22xml%22%20is%20read%20as%20a%20string%2C%20so%20I%20am%20unable%20to%20treat%20it%20like%20a%20hierarchical%20structure.%20However%2C%20by%20adding%20the%20Parse%20transformation%2C%20I%20can%20select%20XML%20as%20my%20incoming%20embedded%20type%20and%20define%20the%20structure%20%22customers%22%20as%20the%20new%20column%20that%20is%20now%20hierarchical%3A%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22parsexml1.png%22%20style%3D%22width%3A%20664px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F279658i8F6E73512A0A26A5%2Fimage-dimensions%2F664x245%3Fv%3Dv2%22%20width%3D%22664%22%20height%3D%22245%22%20role%3D%22button%22%20title%3D%22parsexml1.png%22%20alt%3D%22parsexml1.png%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CUL%3E%0A%3CLI%3E%3CP%3ESource%20XML%20data%3A%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3E%3CCODE%3E%3CCUSTOMERS%3E%3CCUSTOMER%3E122%3C%2FCUSTOMER%3E%3CCOMPANYNAME%3EGreat%20Lakes%20Food%20Market%3C%2FCOMPANYNAME%3E%3C%2FCUSTOMERS%3E%3C%2FCODE%3E%3C%2FP%3E%0A%3CUL%3E%0A%3CLI%3EExpression%3A%3CSPAN%3E%26nbsp%3B%3C%2FSPAN%3E%3CCODE%3E(Customers%20as%20(Customer%20as%20integer%2C%20CompanyName%20as%20string))%3C%2FCODE%3E%3C%2FLI%3E%0A%3C%2FUL%3E%0A%3C%2FLI%3E%0A%3C%2FUL%3E%3C%2FLINGO-BODY%3E%3CLINGO-TEASER%20id%3D%22lingo-teaser-2341848%22%20slang%3D%22en-US%22%3E%3CP%3EAzure%20Data%20Factory%20has%20added%20XML%20parsing%20to%20the%20data%20flows%20Parse%26nbsp%3Btransformation%20in%20ADF%20and%20Synapse%20Analytics%3C%2FP%3E%0A%3CP%3E%26nbsp%3B%3C%2FP%3E%0A%3CP%3E%3CSPAN%20class%3D%22lia-inline-image-display-wrapper%20lia-image-align-inline%22%20image-alt%3D%22parsexml2.png%22%20style%3D%22width%3A%20247px%3B%22%3E%3CIMG%20src%3D%22https%3A%2F%2Ftechcommunity.microsoft.com%2Ft5%2Fimage%2Fserverpage%2Fimage-id%2F279654i23AEA18910BB50F3%2Fimage-size%2Flarge%3Fv%3Dv2%26amp%3Bpx%3D999%22%20role%3D%22button%22%20title%3D%22parsexml2.png%22%20alt%3D%22parsexml2.png%22%20%2F%3E%3C%2FSPAN%3E%3C%2FP%3E%3C%2FLINGO-TEASER%3E%3CLINGO-LABS%20id%3D%22lingo-labs-2341848%22%20slang%3D%22en-US%22%3E%3CLINGO-LABEL%3EAzure%20Data%20Factory%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EAzure%20ETL%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EAzure%20Synapse%20Analytics%3C%2FLINGO-LABEL%3E%3CLINGO-LABEL%3EMapping%20Data%20Flows%3C%2FLINGO-LABEL%3E%3C%2FLINGO-LABS%3E
Co-Authors
Version history
Last update:
‎May 10 2021 03:36 PM
Updated by: