Blog Post

SQL Server Integration Services (SSIS) Blog
2 MIN READ

Pattern: Replacing NULL values with a Derived Column

SSIS-Team's avatar
SSIS-Team
Copper Contributor
Mar 25, 2019
First published on MSDN on Apr 05, 2011

This pattern has been out there for so long I had contemplated taking it out of my http://blogs.msdn.com/b/mattm/archive/2010/06/29/ssis-performance-design-patterns-video.aspx talk, but it looks like it still managed to help some people at http://blogs.msdn.com/b/mattm/archive/2011/03/07/speaking-at-sql-saturday-71-boston.aspx last weekend. This pattern is straight forward, and especially important since the data flow threading improvements were made in 2008.

Problem

Your source data is “unclean” – specifically, it has NULL or missing values. You’d like to replace these NULL values with hard coded defaults, and replace missing dimension values with an http://technet.microsoft.com/en-us/library/ms170707.aspx key.

Solution 1: Slow

This is the traditional approach (and perhaps the more natural way of writing it) is to have a series of Lookups checking for existing dimension values. If the value is not found, you map the “No Match” output to a derived column to substitute in your http://sqlblog.com/blogs/jamie_thomson/archive/2009/10/14/considering-surrogate-keys-for-unknown-members.aspx . You then Union All the two streams, and continue down to the next dimension member Lookup.

This pattern was especially slow in the 2005 due to a bug in the BIDS designer that would cause columns to be unnecessarily compared to each other (increasing exponentially for ever pair of Lookup + Union All components).

Solution 2: Fast

The main problem with Solution 1 is that the Union All component is asynchronous – it will create a full copy of the data flow buffer. It’s also completely unneeded. Instead of using the “No Match” output, we can choose to Ignore missing values. The keys for any missing values will be set to NULL. After all Lookups are complete, we perform all of the NULL value substitutions with a single derived column transformation.

Updated Mar 25, 2019
Version 2.0
No CommentsBe the first to comment