Using XML Source
Published Mar 25 2019 01:59 PM 515 Views
Copper Contributor
First published on MSDN on Dec 12, 2007

Providing a generic way to extract data from XML documents of any format is pretty complicated. The SSIS XML Source Adapter tries to do just that, to various degrees of success.

XML Source is easiest to use when your input has a simple element/sub-element structure:

<rootgoo>
<goo>
<subgoo>value</subgoo>
<moregoo>1</moregoo>
</goo>
<goo>
<subgoo>value</subgoo>
<moregoo>2</moregoo>
</goo>
</rootgoo>
<br/><br/>.csharpcode, .csharpcode pre<br/>{<br/> font-size: small;<br/> color: black;<br/> font-family: consolas, "Courier New", courier, monospace;<br/> background-color: #ffffff;<br/> /*white-space: pre;*/<br/>}<br/>.csharpcode pre { margin: 0em; }<br/>.csharpcode .rem { color: #008000; }<br/>.csharpcode .kwrd { color: #0000ff; }<br/>.csharpcode .str { color: #006080; }<br/>.csharpcode .op { color: #0000c0; }<br/>.csharpcode .preproc { color: #cc6633; }<br/>.csharpcode .asp { background-color: #ffff00; }<br/>.csharpcode .html { color: #800000; }<br/>.csharpcode .attr { color: #ff0000; }<br/>.csharpcode .alt <br/>{<br/> background-color: #f4f4f4;<br/> width: 100%;<br/> margin: 0em;<br/>}<br/>.csharpcode .lnum { color: #606060; }

Or when the values are listed as attributes (such as the output you get from SELECT ... FOR XML RAW statements in SQL Server).



<root>
<row CustomerID="1" TerritoryID="1" AccountNumber="AW00000001" />
<row CustomerID="2" TerritoryID="1" AccountNumber="AW00000002" />
</root>

<br/><br/>.csharpcode, .csharpcode pre<br/>{<br/> font-size: small;<br/> color: black;<br/> font-family: consolas, "Courier New", courier, monospace;<br/> background-color: #ffffff;<br/> /*white-space: pre;*/<br/>}<br/>.csharpcode pre { margin: 0em; }<br/>.csharpcode .rem { color: #008000; }<br/>.csharpcode .kwrd { color: #0000ff; }<br/>.csharpcode .str { color: #006080; }<br/>.csharpcode .op { color: #0000c0; }<br/>.csharpcode .preproc { color: #cc6633; }<br/>.csharpcode .asp { background-color: #ffff00; }<br/>.csharpcode .html { color: #800000; }<br/>.csharpcode .attr { color: #ff0000; }<br/>.csharpcode .alt <br/>{<br/> background-color: #f4f4f4;<br/> width: 100%;<br/> margin: 0em;<br/>}<br/>.csharpcode .lnum { color: #606060; }Both of these XML documents would produce a single output in the XML Source. When your XML data starts to get more complicated (multiple levels of elements / attributes), you start getting more outputs. These outputs are linked with IDs, which you will have to join yourself with a merge join transform.



Dealing with multiple outputs

In this completely made up example. the XML is a little more complex.



<extract date="2007-12-05">
<counters>
<counter category="dispatcher" name="server1">
<runtime>6</runtime>
<queue>3</queue>
<maxrequest>8</maxrequest>
<color>blue</color>
<host>
<name>svo2555</name>
<path>\\dispatcher</path>
<lastaccessed>2007-02-03</lastaccessed>
</host>
</counter>
<counter category="gateway" name="server1">
<runtime>1</runtime>
<queue>10</queue>
<maxrequest>10</maxrequest>
<color>purple</color>
<host>
<name>svo2555</name>
<path>\\gateway</path>
<lastaccessed>2007-02-03</lastaccessed>
</host>
</counter>
</counters>
</extract>
<br/><br/>.csharpcode, .csharpcode pre<br/>{<br/> font-size: small;<br/> color: black;<br/> font-family: consolas, "Courier New", courier, monospace;<br/> background-color: #ffffff;<br/> /*white-space: pre;*/<br/>}<br/>.csharpcode pre { margin: 0em; }<br/>.csharpcode .rem { color: #008000; }<br/>.csharpcode .kwrd { color: #0000ff; }<br/>.csharpcode .str { color: #006080; }<br/>.csharpcode .op { color: #0000c0; }<br/>.csharpcode .preproc { color: #cc6633; }<br/>.csharpcode .asp { background-color: #ffff00; }<br/>.csharpcode .html { color: #800000; }<br/>.csharpcode .attr { color: #ff0000; }<br/>.csharpcode .alt <br/>{<br/> background-color: #f4f4f4;<br/> width: 100%;<br/> margin: 0em;<br/>}<br/>.csharpcode .lnum { color: #606060; }<br/><br/>.csharpcode, .csharpcode pre<br/>{<br/> font-size: small;<br/> color: black;<br/> font-family: consolas, "Courier New", courier, monospace;<br/> background-color: #ffffff;<br/> /*white-space: pre;*/<br/>}<br/>.csharpcode pre { margin: 0em; }<br/>.csharpcode .rem { color: #008000; }<br/>.csharpcode .kwrd { color: #0000ff; }<br/>.csharpcode .str { color: #006080; }<br/>.csharpcode .op { color: #0000c0; }<br/>.csharpcode .preproc { color: #cc6633; }<br/>.csharpcode .asp { background-color: #ffff00; }<br/>.csharpcode .html { color: #800000; }<br/>.csharpcode .attr { color: #ff0000; }<br/>.csharpcode .alt <br/>{<br/> background-color: #f4f4f4;<br/> width: 100%;<br/> margin: 0em;<br/>}<br/>.csharpcode .lnum { color: #606060; }

Let's see how XML Source treats this.





As you see, it created three outputs - counters, counter, and host - and created _id columns to identify the rows. Our destination table wants all of this data in a single row, which means we'll have to merge.





First we'll mark that the data has been sorted already, using the generated _id columns as the Sort Key. This will let us merge without having to insert a Sort transform.



- Right click on the XML Source, and bring up the Advanced Editor



- Select the host output, and set the IsSorted property to True.





- Expand the host output, and then expand Output Columns. Select the _id field, and set the SortKeyPosition property to 1





Follow the same steps for the counter output. Make sure you set the SortKeyPosition value on the "counter_id" column, and not the "counters_id" column - we're going to ignore the counters output all together.



Add a Merge Join transform. Connect the host and counter outputs from the XML Source to it.





You can then open the merge join transform, and take the columns from both inputs.





Voila!

Version history
Last update:
‎Mar 25 2019 01:59 PM
Updated by: