First published on MSDN on Feb 04, 2013
It has been a while since there has been activity on this blog, we as a team will be trying to post weekly going forwards so as to share what we do on a regular basis – happy reading!
I was posed a question by a fellow PFE during his performance `engagement. The specific question was whether a Literal predicate which had different accent sensitivity or collation when compared to a column would trigger an implicit conversion resulting in potential performance degradation and more importantly where in the plan would you see this.
Collation settings exist at the Server level, the Database Level and potentially defined at the column level as well. By default if no collation is specified at the column level when creating the table, database collation is assumed
To check the collation of a database:
SELECT DATABASEPROPERTYEX('TEMPDB','COLLATION')
And further to see the collation of a column, we can use the query below
SELECT object_name(object_id) AS ObjectName ,name AS ColName ,collation_name FROM sys.columns WHERE object_id = object_id('testcollate')
Now moving on to more of the Performance aspects of the question:
Same Collation comparison – If the literal or columns being compared are the same collection, we have no problem as we can see below
SET NOCOUNT ON USE tempdb GO DROP TABLE testcollate GO CREATE TABLE testcollate ( myid INT identity ,myname VARCHAR(810) ) GO INSERT INTO testcollate VALUES (replicate('a', 800)) GO 10000 INSERT INTO testcollate VALUES ('Denzil') GO CREATE INDEX myind ON testcollate (myname) GO SET STATISTICS IO ON GO SELECT myname FROM testcollate WHERE myname = 'Denzil'
Table 'testcollate'. Scan count 1, logical reads 5, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
If the literal being compared has a different collation or is explicitly converted to a different collation, collation precedence kicks in- http://msdn.microsoft.com/en-US/library/ms179886(v=SQL.90).aspx
If the literal has an explicit collation, we will get a plan with a CONVERT/SCAN. The CONVERT_IMPLICIT by itself gives no indication this is due to a Collation difference per say, in fact almost looks like it could be some data type mismatch which it is not and on the constant side, there is a CONVERT given that we were explicitly collating it to a particular collation.
SELECT myname FROM testcollate WHERE myname = 'Denzil' collate SQL_Latin1_General_Cp437_CS_AS
Table 'testcollate'. Scan count 1, logical reads 1240, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
You will have to look at the input/output trees to actually see where the change in collation is happening as that is not exposed in the query plan itself as far as I know. I am using QUERYTRACEON which is an undocumented command in order to demonstrate collation related converts. QueryTraceON is blogged about in several places – See Benjamin blog ( Query Optimizer Trace Flags )
SELECT myname FROM testcollate WHERE myname = 'Denzil' collate SQL_Latin1_General_Cp437_CS_AS OPTION ( RECOMPILE ,QueryTraceon 8606 ) GO
**************************************** *** Input Tree: *** LogOp_Project QCOL: [tempdb].[dbo].[testcollate].myid QCOL: [tempdb].[dbo].[testcollate].myname LogOp_Select LogOp_Get TBL: testcollate testcollate TableID=773577794 TableReferenceID=0 IsRow: COL: IsBaseRow1001 ScaOp_Comp x_cmpEq ScaOp_Convert varchar collate 520142856,Null,Var,Trim,ML=810 ScaOp_Identifier QCOL: [tempdb].[dbo].[testcollate].myname ScaOp_Const TI(varchar collate 520142856,Var,Trim,ML=6) XVAR(varchar,Owned,Value=Len,Data = (6,Denzil)) AncOp_PrjList
In order to get the Collation Name of that ID:
SELECT convert(SYSNAME, collationpropertyfromid(520142856, 'name'))
If the Column has different collation than the database would we need to collate the literal to the column itself? Is the constant or literal collated to the collation of the Database or the collation of the column given they both are different?
USE tempdb GO DROP TABLE testcollate GO CREATE TABLE testcollate ( myid INT identity PRIMARY KEY ,myname VARCHAR(810) collate SQL_Latin1_General_Cp437_CS_AS ) GO INSERT INTO testcollate VALUES (replicate('a', 800)) GO 10000 INSERT INTO testcollate VALUES ('Denzil') GO CREATE INDEX myind ON testcollate (myname) GO --As you can see below, the Database collation is different than the Column collation SELECT DATABASEPROPERTYEX('TEMPDB', 'COLLATION') AS DBCollation ,object_name(object_id) AS ObjectName ,name AS ColName ,collation_name AS ColumnCollation FROM sys.columns WHERE object_id = object_id('testcollate')
SELECT * FROM testcollate WHERE myname = 'Denzil'
We actually get a Seek here, which means the literal here was converted to the collation of the column and not the database.
SELECT * FROM testcollate WHERE myname = 'Denzil' OPTION ( RECOMPILE ,QueryTraceon 8606 )
**************************************** *** Input Tree: *** LogOp_Project QCOL: [tempdb].[dbo].[testcollate].myid QCOL: [tempdb].[dbo].[testcollate].myname LogOp_Select LogOp_Get TBL: testcollate testcollate TableID=741577680 TableReferenceID=0 IsRow: COL: IsBaseRow1001 ScaOp_Comp x_cmpEq ScaOp_Identifier QCOL: [tempdb].[dbo].[testcollate].myname ScaOp_Const TI(varchar collate 520142856,Var,Trim,ML=6) XVAR(varchar,Owned,Value=Len,Data = (6,Denzil)) AncOp_PrjList
Here you can see the Constant is being collated to the Column collation and not the database collation.
SELECT convert(SYSNAME, collationpropertyfromid(520142856, 'name'))
There have been several blogs on Collation conflict and how to resolve that so I intentionally stayed away from that. Arvind also has a blog on some collation and performance scenarios which is a great read - SQL collation and performance
-Denzil Ribeiro , SQL Dedicated Premier Field Engineer
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.