Data Quality Services (DQS) internally stores the domain values in a knowledge base in the Unicode format, and uses the trigram algorithm, which is language agnostic, to compare the domain values with your source data for the cleansing and matching operations. Therefore, you can practically use DQS to cleanse and match data in all the languages that are supported by the Windows operating system.
OK! But, why do you have limited set of language options while creating a string domain?
While creating a string domain in DQS, one can specify the language for the domain.
drop-down list displays a limited set of languages, and this selection is only applicable for the Speller feature in DQS. The Speller feature works only for those languages that are listed in the
However, if you have values in a non-listed language (for example, Greek, Chinese, and so on), you must select
drop-down list disables the Speller feature for the domain.
Does collation setting impact the cleansing/matching in DQS?
The collation setting determines the rules for comparing data in SQL Server. Although DQS stores all values in the Unicode format, the collation setting does influence the comparison rules. For example, the characters that are considered to be different in a collation setting might be considered the same in another collation setting. Therefore, you should choose a different collation setting, other than the default collation setting, while installing DQS only if you are completely aware of the collation comparison rules, and are sure about using the same comparison rules in DQS for cleansing and matching.
However, if you are not sure about the comparison rules in a collation, you must install DQS with the default server collation, and things should work fine for you. For more information about installing DQS and specifying collation settings, see