The sound of silence – null, empty, space, missing, dummy and other animals
Published Mar 25 2019 03:12 PM 523 Views
Copper Contributor
First published on MSDN on Aug 08, 2011

Null, space, dummy, “unknown”, 99999, empty.  All these small and innocent words represent missing data that may have a bad impact on our business execution and analysis.

  • How can a customer participate in my new email campaign if he doesn’t have an email?

  • How can I ship my merchandise quickly and get mailing discounts from the post office if my data misses a zip code?

  • How can I report correctly on my business if my Segment field has the Value “Unknown”?

So how DQS addresses the case of missing information?

    Each domain has a seeded value – DQS_NULL, which represents null values. When the data has a null or any number of spaces, it will be associated with this value.
    This DQS_NULL cannot be deleted and can be set as Correct or Invalid.

    The following diagram presnets the DQS_NULL in domain values:

  • Null equivalence
    Null Equivalence is related to the cases where your data consists actual values that are equivalent to an empty field. Common examples are “Unknown” , “Do not Know”, or 99999 for numeric values and 1/1/1111 for date values. To define these values as Null Equivalence, all you need to do is add them to the domain, and then link them to the DQS_NULL value as synonyms.

    The following diagrams presents Null equivalence values in Domain values:

  • Completeness
    DQS Integrated profiling has a specific metric that tracks missing values. This metric will track all the nulls and null equivalent values in your data.
    So if you’ve set the “Unknown” and DQS_NULL as null equivalent and your data has 5% populated with spaces and 10% populated with “Unknown” , the completeness metric will show that 15% of the data is missing.

    The following diagrams presents the profiling completeness in DQS activity, before and after the definition of Null Equivalence:

  • Missing data is Invalid
    If you want, you can set the DQS_NULL and all the null equivalence values as invalid, or set a domain rule that will set them to Invalid.

    The following diagrams present In Valid values in Domain Management and how they are tracked in a cleansing project:

This concludes our discussion on Null Values .

Now go out and hunt them down !!!

Version history
Last update:
‎Mar 25 2019 03:12 PM
Updated by: