Question : My customer wants to store data other than English in his table, using the Latin alphabet. For example, he has Spanish, German, Portuguese and English data. He wants to be able to return all the variations of the same character with one query, regardless of whether it has a diacritic mark (accent). For example, if he queries for last name like '%pena%', he wants to get both “Arpeña” and “Arpena” returned. He wants to ensure queries using Tempdb, linked servers, and full-text search work fine as well.
Answer : There could be a few approaches to this, but the easiest solution is to use Accent-Insensitive collation to store this data. What this means is that regardless of accent or diacritic mark added to the character, SQL Server will treat them all as if they do not have one (or as if they all have the same one). Here is an example:
use AccentInsensitiveTest go drop table t1 create table t1 (c1 int, c2 nvarchar(2000) collate SQL_Latin1_General_CP1_CI_AI) --Can use Latin1_General_CI_AI also
--note that we are inserting two values one with the ñ and one using n
insert into t1 values (1, 'Arpena'), (2, 'Arpeña')
--now select the data out using either version of the character
select * from t1 where c2 like '%pen%'
select * from t1 where c2 like '%peñ%'
Results – as you can see regardless of diacritic mark (accent), SQL Server returns both rows:
Full-Text search is built to locate words in a sentence and then discover the stem of each word. Thus, full-text search is “smart” in recognizing that for example ‘el’ is different from ‘él’ in Spanish. Therefore, Full-text search cannot be “fooled” to return both words at the same time with one query or one clause. Two different clauses would be necessary. In the specific case of querying for last name, full-text is not a solution because names are not really “proper” words in a language (unless the name are in fact language nouns like “joy”).
Linked Servers Considerations
Running a linked server query against the accent-insensitive table, should be pretty transparent.
select * from openquery ([mySQLServer\sql2008r2], 'select * from [AccentInsensitiveTest].dbo.t1 where c2 like ''%peñ%''')
select * from [mySQLServer\sql2008r2].[AccentInsensitiveTest].dbo.t1 where c2 like '%peñ%'
For any additional linked server considerations, review this MSDN article