First published on MSDN on Aug 08, 2011
Null, space, dummy, “unknown”, 99999, empty. All these small and innocent words represent missing data that may have a bad impact on our business execution and analysis.
How can a customer participate in my new email campaign if he doesn’t have an email?
How can I ship my merchandise quickly and get mailing discounts from the post office if my data misses a zip code?
How can I report correctly on my business if my Segment field has the Value “Unknown”?
So how DQS addresses the case of missing information?
Each domain has a seeded value – DQS_NULL, which represents null values. When the data has a null or any number of spaces, it will be associated with this value.
This DQS_NULL cannot be deleted and can be set as Correct or Invalid.
The following diagram presnets the DQS_NULL in domain values:
Null Equivalence is related to the cases where your data consists actual values that are equivalent to an empty field. Common examples are “Unknown” , “Do not Know”, or 99999 for numeric values and 1/1/1111 for date values. To define these values as Null Equivalence, all you need to do is add them to the domain, and then link them to the
value as synonyms.
The following diagrams presents Null equivalence values in Domain values:
DQS Integrated profiling has a specific metric that tracks missing values. This metric will track all the nulls and null equivalent values in your data.
So if you’ve set the “Unknown” and DQS_NULL as null equivalent and your data has 5% populated with spaces and 10% populated with “Unknown” , the completeness metric will show that 15% of the data is missing.
The following diagrams presents the profiling completeness in DQS activity, before and after the definition of Null Equivalence:
Missing data is Invalid
If you want, you can set the DQS_NULL and all the null equivalence values as invalid, or set a domain rule that will set them to Invalid.
The following diagrams present In Valid values in Domain Management and how they are tracked in a cleansing project:
This concludes our discussion on Null Values .
Now go out and hunt them down !!!