SQL Collation and related performance impact, viewing collation in query plans
Published Mar 15 2019 11:42 AM 7,382 Views
Brass Contributor

First published on MSDN on Feb 04, 2013

It has been a while since there has been activity on this blog, we as a team will be trying to post weekly going forwards so as to share what we do on a regular basis – happy reading!

 

I was posed a question by a fellow PFE during his performance `engagement. The specific question was whether a Literal predicate which had different accent sensitivity or collation when compared to a column would trigger an implicit conversion resulting in potential performance degradation and more importantly where in the plan would you see this.

 

Collation settings exist at the Server level, the Database Level and potentially defined at the column level as well. By default if no collation is specified at the column level when creating the table, database collation is assumed

 

To check the collation of a database:

SELECT DATABASEPROPERTYEX('TEMPDB','COLLATION')

 

And further to see the collation of a column, we can use the query below

SELECT object_name(object_id) AS ObjectName
	,name AS ColName
	,collation_name
FROM sys.columns
WHERE object_id = object_id('testcollate')

Now moving on to more of the Performance aspects of the question:

Same Collation comparison – If the literal or columns being compared are the same collection, we have no problem as we can see below

SET NOCOUNT ON

USE tempdb
GO

DROP TABLE testcollate
GO

CREATE TABLE testcollate (
	myid INT identity
	,myname VARCHAR(810)
	)
GO

INSERT INTO testcollate
VALUES (replicate('a', 800)) GO 10000

INSERT INTO testcollate
VALUES ('Denzil')
GO

CREATE INDEX myind ON testcollate (myname)
GO

SET STATISTICS IO ON
GO

SELECT myname
FROM testcollate
WHERE myname = 'Denzil'

 

Table 'testcollate'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

If the literal being compared has a different collation or is explicitly converted to a different collation, collation precedence kicks in- http://msdn.microsoft.com/en-US/library/ms179886(v=SQL.90).aspx

If the literal has an explicit collation, we will get a plan with a CONVERT/SCAN. The CONVERT_IMPLICIT by itself gives no indication this is due to a Collation difference per say, in fact almost looks like it could be some data type mismatch which it is not and on the constant side, there is a CONVERT given that we were explicitly collating it to a particular collation.

SELECT myname
FROM testcollate
WHERE myname = 'Denzil' collate SQL_Latin1_General_Cp437_CS_AS

Table 'testcollate'. Scan count 1, logical reads 1240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

You will have to look at the input/output trees to actually see where the change in collation is happening as that is not exposed in the query plan itself as far as I know. I am using QUERYTRACEON which is an undocumented command in order to demonstrate collation related converts. QueryTraceON is blogged about in several places – See Benjamin blog ( Query Optimizer Trace Flags )

SELECT myname
FROM testcollate
WHERE myname = 'Denzil' collate SQL_Latin1_General_Cp437_CS_AS
OPTION (
	RECOMPILE
	,QueryTraceon 8606
	)
GO
****************************************
*** Input Tree: ***

        LogOp_Project QCOL: [tempdb].[dbo].[testcollate].myid QCOL: [tempdb].[dbo].[testcollate].myname

            LogOp_Select

                LogOp_Get TBL: testcollate testcollate TableID=773577794 TableReferenceID=0 IsRow: COL: IsBaseRow1001

                ScaOp_Comp x_cmpEq

                    ScaOp_Convert varchar collate 520142856,Null,Var,Trim,ML=810

                        ScaOp_Identifier QCOL: [tempdb].[dbo].[testcollate].myname

                    ScaOp_Const TI(varchar collate 520142856,Var,Trim,ML=6) XVAR(varchar,Owned,Value=Len,Data = (6,Denzil))

            AncOp_PrjList 

In order to get the Collation Name of that ID:

SELECT convert(SYSNAME, collationpropertyfromid(520142856, 'name'))

If the Column has different collation than the database would we need to collate the literal to the column itself?  Is the constant or literal collated to the collation of the Database or the collation of the column given they both are different?

USE tempdb
GO

DROP TABLE testcollate
GO

CREATE TABLE testcollate (
	myid INT identity PRIMARY KEY
	,myname VARCHAR(810) collate SQL_Latin1_General_Cp437_CS_AS
	)
GO

INSERT INTO testcollate
VALUES (replicate('a', 800)) GO 10000

INSERT INTO testcollate
VALUES ('Denzil')
GO

CREATE INDEX myind ON testcollate (myname)
GO

--As you can see below, the Database collation is different than the Column collation
SELECT DATABASEPROPERTYEX('TEMPDB', 'COLLATION') AS DBCollation
	,object_name(object_id) AS ObjectName
	,name AS ColName
	,collation_name AS ColumnCollation
FROM sys.columns
WHERE object_id = object_id('testcollate')

SELECT *
FROM testcollate
WHERE myname = 'Denzil'

We actually get a Seek here, which means the literal here was converted to the collation of the column and not the database.

SELECT *
FROM testcollate
WHERE myname = 'Denzil'
OPTION (
	RECOMPILE
	,QueryTraceon 8606
	)
****************************************
*** Input Tree: ***

        LogOp_Project QCOL: [tempdb].[dbo].[testcollate].myid QCOL: [tempdb].[dbo].[testcollate].myname

            LogOp_Select

                LogOp_Get TBL: testcollate testcollate TableID=741577680 TableReferenceID=0 IsRow: COL: IsBaseRow1001

                ScaOp_Comp x_cmpEq

                    ScaOp_Identifier QCOL: [tempdb].[dbo].[testcollate].myname

                    ScaOp_Const TI(varchar collate 520142856,Var,Trim,ML=6) XVAR(varchar,Owned,Value=Len,Data = (6,Denzil))

            AncOp_PrjList

Here you can see the Constant is being collated to the Column collation and not the database collation.

SELECT convert(SYSNAME, collationpropertyfromid(520142856, 'name'))

There have been several blogs on Collation conflict and how to resolve that so I intentionally stayed away from that. Arvind also has a blog on some collation and performance scenarios which is a great read - SQL collation and performance

-Denzil Ribeiro , SQL Dedicated Premier Field Engineer

1 Comment
Copper Contributor

Hi there. A few clarifications need to be made to the info presented here.

 

whether a Literal predicate which had different accent sensitivity or collation when compared to a column would trigger an implicit conversion resulting in potential performance degradation

  1. String literals and variables initially use the database's default collation as their collation.
  2. Assuming no explicit collation set via COLLATE clause on the predicate, the collation of the literal or variable is irrelevant (from a performance perspective, though for VARCHAR data it can be relevant if any characters having values 128 - 255 / 0x80 - 0xFF are present) since collation precedence uses the collation of the column.
  3. implicit conversions, by themselves, are not the issue, as whatever performance degradation they cause is minimal and likely barely noticeable.

 

 

If the literal being compared has a different collation or is explicitly converted to a different collation, collation precedence kicks in

  1. yes, if there is any difference in collation, for any reason, collation precedence will determine which collation to use, of if both sides have equal weight, then an error will be thrown.
  2. a literal (or variable) always starts out with the database's default collation. But this doesn't matter since collation precedence will use the collation of the column.
  3. an explicit collation (i.e. set via the COLLATE {collation_name} clause on a predicate) will override the column's collation, but if there is any degradation in performance in this scenario (and that's usually only possible, though not even guaranteed, if the column is indexed), then nobody should really be surprised because it was an explicit change that is clearly stated in the query ;)

 

Is the literal collated to the collation of the Database or the collation of the column given they both are different? ... We actually get a Seek here, which means the literal here was converted to the collation of the column and not the database.

The question and answer here are both poorly phrased and hence can be misleading.

  1. the actual answer to the question is: BOTH. Initially (i.e. when the query is being parsed), literals and variables use the database's default collation. Then, if compared to a column having a different collation than the database (or COLLATE clause is used), the collation will be converted to the column's collation (or to what is specified by the COLLATE clause).
  2. literals (and variables) need to have a collation, so it doesn't make sense that they would be converted to the database's collation; they are already using the database's collation.

 

With regards to a collation difference causing performance degradation, that can only really happen under the following conditions:

  1. column is indexed, and ...
  2. COLLATE clause is specified on the predicate
  3. COLLATE clause specifies a different collation than the column is using (and hence was used in creating the index)

 

However, the scenario that typically does cause a performance degradation and is characterized as being caused by an "implicit conversion" (even though that isn't the real issue) is datatype precedence:

  1. column is VARCHAR.
  2. column is using a SQL Server collation (i.e. name starting with "SQL_").
  3. column is indexed.
  4. column is compared to NVARCHAR literal or variable, or NVARCHAR column using the same collation (different collation would simply error)

I have a detailed write-up about this here:

Impact on Indexes When Mixing VARCHAR and NVARCHAR Types

 

Take care,
Solomon...

https://SqlQuantumLift.com/
https://SqlQuantumLeap.com/
https://SQLsharp.com/

 

Version history
Last update:
‎Apr 28 2020 12:51 PM
Updated by: