First published on MSDN on Aug 08, 2011
Null, space, dummy, “unknown”, 99999, empty. All these small and innocent words represent missing data that may have a bad impact on our business execution and analysis.
-
How can a customer participate in my new email campaign if he doesn’t have an email?
-
How can I ship my merchandise quickly and get mailing discounts from the post office if my data misses a zip code?
-
How can I report correctly on my business if my Segment field has the Value “Unknown”?
So how DQS addresses the case of missing information?
-
DQS_NULL
Each domain has a seeded value – DQS_NULL, which represents null values. When the data has a null or any number of spaces, it will be associated with this value.
This DQS_NULL cannot be deleted and can be set as Correct or Invalid.
The following diagram presnets the DQS_NULL in domain values:
-
Null equivalence
Null Equivalence is related to the cases where your data consists actual values that are equivalent to an empty field. Common examples are “Unknown” , “Do not Know”, or 99999 for numeric values and 1/1/1111 for date values. To define these values as Null Equivalence, all you need to do is add them to the domain, and then link them to the
DQS_NULL
value as synonyms.
The following diagrams presents Null equivalence values in Domain values:
-
Completeness
DQS Integrated profiling has a specific metric that tracks missing values. This metric will track all the nulls and null equivalent values in your data.
So if you’ve set the “Unknown” and DQS_NULL as null equivalent and your data has 5% populated with spaces and 10% populated with “Unknown” , the completeness metric will show that 15% of the data is missing.
The following diagrams presents the profiling completeness in DQS activity, before and after the definition of Null Equivalence:
-
Missing data is Invalid
If you want, you can set the DQS_NULL and all the null equivalence values as invalid, or set a domain rule that will set them to Invalid.
The following diagrams present In Valid values in Domain Management and how they are tracked in a cleansing project:
This concludes our discussion on Null Values .
Now go out and hunt them down !!!