ce
1 TopicMulti-column statistics
In post SQL Server CE: Multiple single-column statistics, I discussed how SQL Server calculates Multiple single-column statistics. I'm going to talk about the 'multiple-columns statistics' today. Here are examples , adventure 2019 OLTP database is used in this example ------- Data manipulation------------------------------------------- alter database [AdventureWorks2019] set compatibility_level=150 go use [AdventureWorks2019] go if exists(select 1 from sys.tables where name='SalesOrderDetail') drop table SalesOrderDetail go select * into SalesOrderDetail from Sales.SalesOrderDetail---- import all the data into new table SalesOrderDetail. Go --Create two statistics explicitly. The first statistics has two columns create statistics I_ProductID_UnitPrice on SalesOrderDetail(ProductID,UnitPrice) with fullscan create statistics I_UnitPrice on SalesOrderDetail(UnitPrice) with fullscan ------- Data manipulation------------------------------------------- NEW CE in different database compatibility behaves differently, I'm going to demo by one by. NEW CE-2017/2019-db_compatibility_level>=140 a)The selectivity of combined multi single-column stats, the formula is: MAX( min('all density',p0,p1,p2,p3), p0*p1*p2*p3) b)'All density' of the combined columns in multi-column statistics c)p0,p1,p2,p3 are the selectivity of each value of the column in WHERE clause and P0<p1<p2<p3. d)If the multi-columns has more than 4 columns, it only counts the first 4, the rest of them are ignored. 1.New CE with compatibility_level 150 alter database [AdventureWorks2019] set compatibility_level=150 select * from SalesOrderDetail where ProductID=870 and UnitPrice=4.99 option(recompile) 1)The estimated row is 341. 2)Here is the estimated row formula: max( min('All density of two columns',P1,P2),P1*P2) *cardinality. a)All density of two columns is :0.001321004 b)p0 is the selectivity of statistics of one column(ProductID) in where clause, which is 4688/121317=0.03864256 c)p1 is the selectivity of statistics of the rest of column(UnitPrice), which is 8827/121317=0.07275979 d)cardinality is 121317 dbcc show_statistics(SalesOrderDetail,I_ProductID_UnitPrice) dbcc show_statistics(SalesOrderDetail,I_UnitPrice) 3)Let me pass the values to the formula: max( min('All density of two columns',p0,p1),p0*p1) *cardinality max( min(0.001321004,0.03864256,0.07275979),0.03864256*0.07275979)*121317 =max(0.001321004,0.03864256*0.07275979)*121317 =max(0.001321004,0.0028116245506624)*121317 =0.0028116245506624*121317=341.09785561, rounded down to 341. 4)Enable trace flag 2363 gives you more detail Dbcc traceon(3604,2363) ------------trace flag 2363 output----------------- Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].ProductID from stats with id 2 Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].UnitPrice from stats with id 3 Cardinality using multi-column statistics 0.001321 and with independence assumption 0.00281163. Picking cardinality 0.00281163 Selectivity: 0.00281163 Stats collection generated: CStCollFilter(ID=3, CARD=341.098) CStCollBaseTable(ID=1, CARD=121317 TBL: SalesOrderDetail) End selectivity computation ------------trace flag 2363 output----------------- NEW CE-2016-db_compatibility_level=130 a)The selectivity of multi single-column stats, the formula is: min('all density',p0,p1,p2,p3) b)'All density' is the one for all columns in multi-column statistics c)p0,p1,p2,p3 are the selectivity of each value of the column in WHERE clause and P0<p1<p2<p3. d)If the multi-columns has more than 4 columns, it only counts the first 4, the rest of them are ignored. Note, if trace flag 4199 is enabled, this rule is as same as the rule in DB_compatibility_level 140/150 1.New CE with compatibility_level 130 alter database [AdventureWorks2019] set compatibility_level=130 select * from SalesOrderDetail where ProductID=870 and UnitPrice=4.99 option(recompile) 1)The estimated row is 160. 2)Here is the estimated row formula: min('all density',p0,p1,p2,p3) *cardinality. a)All density of two columns is :0.001321004 b)p0 is the selectivity of statistics of one column(ProductID) in where clause, which is 4688/121317=0.03864256 c)p1 is the selectivity of statistics of the rest of column(UnitPrice), which is 8827/121317=0.07275979 d)cardinality is 121317 dbcc show_statistics(SalesOrderDetail,I_ProductID_UnitPrice) dbcc show_statistics(SalesOrderDetail,I_UnitPrice) 3)Let me pass the values to the formula: min('All density of two columns',p0,p1) *cardinality MIN(0.001321004,0.03864256,0.07275979)*121317=0.001321004*121317=160.260242268, is rounded down to 160. 4)Enable trace flag 2363 gives you more detail Dbcc traceon(3604,2363,) ------------trace flag 2363 output----------------- Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].ProductID from stats with id 2 Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].UnitPrice from stats with id 3 Selectivity: 0.001321 Stats collection generated: CStCollFilter(ID=3, CARD=160.26) CStCollBaseTable(ID=1, CARD=121317 TBL: SalesOrderDetail) End selectivity computation ------------trace flag 2363 output----------------- NEW CE-2014-db_compatibility_level=120 SQL 2014 is not aware of Multi-columns stats, the algorithm is as same as Multiple single column statistics.formula is: p0 * p1^(1/2) * p2^(1/4)* p3^(1/8) Note, if trace flag 4199 is enabled, this rule is as same as the rule in DB_compatibility_level 140/150 1.New CE with compatibility_level 120 alter database [AdventureWorks2019] set compatibility_level=120 select * from SalesOrderDetail where ProductID=870 and UnitPrice=4.99 option(recompile) 1)The estimated row is 12650. 2)Here is the estimated row formula: p0 * p1^(1/2) *cardinality. a)All density of two columns is :0.001321004 b)p0 is the selectivity of statistics of one column(ProductID) in where clause, which is 4688/121317=0.03864256 c)p1 is the selectivity of statistics of the rest of column(UnitPrice), which is 8827/121317=0.07275979 d)cardinality is 121317 dbcc show_statistics(SalesOrderDetail,I_ProductID_UnitPrice) dbcc show_statistics(SalesOrderDetail,I_UnitPrice) 3)Let me pass the values to the formula: p0 * p1^(1/2) *cardinality 0.03864256*0.07275979^(1/2)*121317=0.0104235*121317=1264.547750, is rounded up to 1265. 4)Enable trace flag 2363 gives you more detail Dbcc traceon(3604,2363,) ------------trace flag 2363 output----------------- Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].ProductID from stats with id 2 Loaded histogram for column QCOL: [AdventureWorks2019].[dbo].[SalesOrderDetail].UnitPrice from stats with id 3 Selectivity: 0.0104235 Stats collection generated: CStCollFilter(ID=2, CARD=1264.54) CStCollBaseTable(ID=1, CARD=121317 TBL: SalesOrderDetail) End selectivity computation ------------trace flag 2363 output-----------------7.2KViews1like0Comments