SQL Server 2012 Data Quality Services (DQS) enables you to cleanse data using your own Knowledge Base. In this blog article, I am going to show how to create a simple Knowledge Base which you can use to cleanse your customer data.
The steps for using DQS to cleanse data is as follows:
A. Create DQS Knowledge Base
B. Create a DQS project to cleanse your data using the Knowledge Base
For this example, I am using the following sample data - you can run the following script on your SQL Server database :
CREATE TABLE MyCustomers
INSERT INTO MyCustomers
VALUES (1, 'Consolidate Co Ltd', 'Miami', 'FL','2013-01-01'),
(2, 'Consolidation Company Ltd', 'New York', 'NY','2013-01-01'),
(3, N'什锦的件', 'LA', 'CA','2013-01-01'),
(4, 'Chop-suey Chinese', 'Los Angeles', 'CA', '2013-03-03'),
(5, 'Big Cheese, The', 'Redmond', 'WA', '2013-02-02'),
(6, 'THE BIG CHEESE', 'Chicago', 'Il','2013-02-02'),
(7, 'To Be Filled Later', 'Redmond', 'Wash.', '2013-01-01')
Note: by default, DQS include DQS_NULL as valid value for the domain, you can change the type to invalid if you would like the record with missing value to be flagged
Creating your own DQS Knowledge Base sometimes requires a lot of effort. For things such as address cleansing, phone number cleansing, creating your own complete list of all valid and invalid values can be a huge effort. DQS supports integration with third party service provider to cleanse your data through DQS. Refer to my blog article on how to cleanse Customer Data using Dun & Bradstreet for more information.
In this blog article, I discussed how to create a Knowledge Base to cleanse customer data, including use of domain values and term based relations. You can then refer to the Knowledge Base to create Data Quality Projects to cleanse your data. You can use the same knowledge base to perform cleansing on many Data Quality Projects. You can also automate the cleansing using SQL Server 2012 Integration Services. Matt Mason wrote a nice article : Overview of DQS Transform that describes the SSIS DQS Cleansing transform.
You may also notice that there appears to be duplicate records in the sample data I used. In the next article , I will describe on how to enhance your knowledge base by adding matching policy and identify duplicate and related records in your dataset.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.