First published on MSDN on Dec 15, 2007
As I wrote in my
previous post
, complex XML documents will produce multiple outputs when you're using the XML Source adapter. Most of the time it will be easier to pre-process your source file with XSLT to de-normalize it a bit. Reducing the number of outputs greatly simplifies your data flow.
Let's take the same XML document I used in the last example:
<extract date="2007-12-05">
<counters>
<counter category="dispatcher" name="server1">
<runtime>6</runtime>
<queue>3</queue>
<maxrequest>8</maxrequest>
<color>blue</color>
<host>
<name>svo2555</name>
<path>\\dispatcher</path>
<lastaccessed>2007-02-03</lastaccessed>
</host>
</counter>
<counter category="gateway" name="server1">
<runtime>1</runtime>
<queue>10</queue>
<maxrequest>10</maxrequest>
<color>purple</color>
<host>
<name>svo2555</name>
<path>\\gateway</path>
<lastaccessed>2007-02-03</lastaccessed>
</host>
</counter>
</counters>
</extract>
We want to flatten this out a bit using an XSL transform like this one (forgive my novice XSLT skills):
<xsl:stylesheet version="1.0" xmlns:xsl="
http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/extract">
<xsl:variable name="extractDate" select="/extract/@date" />
<counters>
<xsl:for-each select="counters/counter">
<counter>
<extractDate><xsl:value-of select="$extractDate"/></extractDate>
<category><xsl:value-of select="@category"/></category>
<name><xsl:value-of select="@name"/></name>
<runtime><xsl:value-of select="runtime"/></runtime>
<queue><xsl:value-of select="queue"/></queue>
<maxrequest><xsl:value-of select="maxrequest"/></maxrequest>
<color><xsl:value-of select="color"/></color>
<hostName><xsl:value-of select="host/name"/></hostName>
<path><xsl:value-of select="host/path"/></path>
<lastaccessed><xsl:value-of select="host/lastaccessed"/></lastaccessed>
</counter>
</xsl:for-each>
</counters>
</xsl:template>
</xsl:stylesheet>
We'll apply the transform with an XML Task. Add one to your package, and open the editor. You'll want to change the Operation Type property to XSLT, set SaveOperationResult to true, and set all of the file connections.
Note, the Source should be your XML source document and the SecondOperand is your XSLT document.
The processed XML looks like this:
<?xml version="1.0" encoding="utf-8"?>
<counters>
<counter>
<extractDate>2007-12-05</extractDate>
<category>dispatcher</category>
<name>server1</name>
<runtime>6</runtime>
<queue>3</queue>
<maxrequest>8</maxrequest>
<color>blue</color>
<hostName>svo2555</hostName>
<path>\\dispatcher</path>
<lastaccessed>2007-02-03</lastaccessed>
</counter>
<counter>
<extractDate>2007-12-05</extractDate>
<category>gateway</category>
<name>server1</name>
<runtime>1</runtime>
<queue>10</queue>
<maxrequest>10</maxrequest>
<color>purple</color>
<hostName>svo2555</hostName>
<path>\\gateway</path>
<lastaccessed>2007-02-03</lastaccessed>
</counter>
</counters>
Add a Data Flow Task, and setup your XML source to use the processed XML document. You'll need to update/regenerate the schema for your document to account for the new format. Notice there is now only one output to deal with.