SQL Collation and related performance impact, viewing collation in query plans
Published Mar 15 2019 11:42 AM 8,120 Views
Brass Contributor

First published on MSDN on Feb 04, 2013

It has been a while since there has been activity on this blog, we as a team will be trying to post weekly going forwards so as to share what we do on a regular basis – happy reading!

 

I was posed a question by a fellow PFE during his performance `engagement. The specific question was whether a Literal predicate which had different accent sensitivity or collation when compared to a column would trigger an implicit conversion resulting in potential performance degradation and more importantly where in the plan would you see this.

 

Collation settings exist at the Server level, the Database Level and potentially defined at the column level as well. By default if no collation is specified at the column level when creating the table, database collation is assumed

 

To check the collation of a database:

SELECT DATABASEPROPERTYEX('TEMPDB','COLLATION')

 

And further to see the collation of a column, we can use the query below

SELECT object_name(object_id) AS ObjectName
	,name AS ColName
	,collation_name
FROM sys.columns
WHERE object_id = object_id('testcollate')

Now moving on to more of the Performance aspects of the question:

Same Collation comparison – If the literal or columns being compared are the same collection, we have no problem as we can see below

SET NOCOUNT ON

USE tempdb
GO

DROP TABLE testcollate
GO

CREATE TABLE testcollate (
	myid INT identity
	,myname VARCHAR(810)
	)
GO

INSERT INTO testcollate
VALUES (replicate('a', 800)) GO 10000

INSERT INTO testcollate
VALUES ('Denzil')
GO

CREATE INDEX myind ON testcollate (myname)
GO

SET STATISTICS IO ON
GO

SELECT myname
FROM testcollate
WHERE myname = 'Denzil'

 

Table 'testcollate'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

If the literal being compared has a different collation or is explicitly converted to a different collation, collation precedence kicks in- http://msdn.microsoft.com/en-US/library/ms179886(v=SQL.90).aspx

If the literal has an explicit collation, we will get a plan with a CONVERT/SCAN. The CONVERT_IMPLICIT by itself gives no indication this is due to a Collation difference per say, in fact almost looks like it could be some data type mismatch which it is not and on the constant side, there is a CONVERT given that we were explicitly collating it to a particular collation.

SELECT myname
FROM testcollate
WHERE myname = 'Denzil' collate SQL_Latin1_General_Cp437_CS_AS

Table 'testcollate'. Scan count 1, logical reads 1240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

You will have to look at the input/output trees to actually see where the change in collation is happening as that is not exposed in the query plan itself as far as I know. I am using QUERYTRACEON which is an undocumented command in order to demonstrate collation related converts. QueryTraceON is blogged about in several places – See Benjamin blog ( Query Optimizer Trace Flags )

SELECT myname
FROM testcollate
WHERE myname = 'Denzil' collate SQL_Latin1_General_Cp437_CS_AS
OPTION (
	RECOMPILE
	,QueryTraceon 8606
	)
GO
****************************************
*** Input Tree: ***

        LogOp_Project QCOL: [tempdb].[dbo].[testcollate].myid QCOL: [tempdb].[dbo].[testcollate].myname

            LogOp_Select

                LogOp_Get TBL: testcollate testcollate TableID=773577794 TableReferenceID=0 IsRow: COL: IsBaseRow1001

                ScaOp_Comp x_cmpEq

                    ScaOp_Convert varchar collate 520142856,Null,Var,Trim,ML=810

                        ScaOp_Identifier QCOL: [tempdb].[dbo].[testcollate].myname

                    ScaOp_Const TI(varchar collate 520142856,Var,Trim,ML=6) XVAR(varchar,Owned,Value=Len,Data = (6,Denzil))

            AncOp_PrjList 

In order to get the Collation Name of that ID:

SELECT convert(SYSNAME, collationpropertyfromid(520142856, 'name'))

If the Column has different collation than the database would we need to collate the literal to the column itself?  Is the constant or literal collated to the collation of the Database or the collation of the column given they both are different?

USE tempdb
GO

DROP TABLE testcollate
GO

CREATE TABLE testcollate (
	myid INT identity PRIMARY KEY
	,myname VARCHAR(810) collate SQL_Latin1_General_Cp437_CS_AS
	)
GO

INSERT INTO testcollate
VALUES (replicate('a', 800)) GO 10000

INSERT INTO testcollate
VALUES ('Denzil')
GO

CREATE INDEX myind ON testcollate (myname)
GO

--As you can see below, the Database collation is different than the Column collation
SELECT DATABASEPROPERTYEX('TEMPDB', 'COLLATION') AS DBCollation
	,object_name(object_id) AS ObjectName
	,name AS ColName
	,collation_name AS ColumnCollation
FROM sys.columns
WHERE object_id = object_id('testcollate')

SELECT *
FROM testcollate
WHERE myname = 'Denzil'

We actually get a Seek here, which means the literal here was converted to the collation of the column and not the database.

SELECT *
FROM testcollate
WHERE myname = 'Denzil'
OPTION (
	RECOMPILE
	,QueryTraceon 8606
	)
****************************************
*** Input Tree: ***

        LogOp_Project QCOL: [tempdb].[dbo].[testcollate].myid QCOL: [tempdb].[dbo].[testcollate].myname

            LogOp_Select

                LogOp_Get TBL: testcollate testcollate TableID=741577680 TableReferenceID=0 IsRow: COL: IsBaseRow1001

                ScaOp_Comp x_cmpEq

                    ScaOp_Identifier QCOL: [tempdb].[dbo].[testcollate].myname

                    ScaOp_Const TI(varchar collate 520142856,Var,Trim,ML=6) XVAR(varchar,Owned,Value=Len,Data = (6,Denzil))

            AncOp_PrjList

Here you can see the Constant is being collated to the Column collation and not the database collation.

SELECT convert(SYSNAME, collationpropertyfromid(520142856, 'name'))

There have been several blogs on Collation conflict and how to resolve that so I intentionally stayed away from that. Arvind also has a blog on some collation and performance scenarios which is a great read - SQL collation and performance

-Denzil Ribeiro , SQL Dedicated Premier Field Engineer

1 Comment
Version history
Last update:
‎Apr 28 2020 12:51 PM
Updated by: