Multi-column statistics
Published Nov 01 2022 11:16 AM 1,001 Views
Microsoft

Multi-column statistics

In post SQL Server CE: Multiple single-column statistics, I discussed how SQL Server calculates Multiple single-column statistics. I'm going to talk about the 'multiple-columns statistics' today.

Here are examples , adventure 2019 OLTP  database is used in this example

------- Data manipulation-------------------------------------------

go

go

if exists(select 1 from sys.tables where name='SalesOrderDetail')

drop table SalesOrderDetail

go

select * into SalesOrderDetail from Sales.SalesOrderDetail---- import all the data into new table SalesOrderDetail.

Go

--Create two statistics explicitly. The first statistics has two columns

create statistics I_ProductID_UnitPrice on SalesOrderDetail(ProductID,UnitPrice) with fullscan

create statistics I_UnitPrice  on SalesOrderDetail(UnitPrice) with fullscan

------- Data manipulation-------------------------------------------

NEW CE in different database compatibility  behaves differently, I'm going to demo by one by.

NEW CE-2017/2019-db_compatibility_level>=140

a)The selectivity of combined multi single-column stats, the formula is: MAX(  min('all density',p0,p1,p2,p3),   p0*p1*p2*p3)

b)'All density' of the combined columns  in  multi-column statistics

c)p0,p1,p2,p3 are the selectivity of each value of the column in WHERE clause and P0<p1<p2<p3.

d)If the multi-columns has more than 4 columns, it only counts the first 4, the rest of them are ignored.

1.New CE with compatibility_level 150

select * from SalesOrderDetail where ProductID=870 and UnitPrice=4.99 option(recompile)

1)The estimated row is 341.

2)Here is the estimated row formula:   max(  min('All density of two columns',P1,P2),P1*P2)  *cardinality.

a)All density of two columns is :0.001321004

b)p0 is the selectivity of statistics of one column(ProductID) in where clause, which is 4688/121317=0.03864256

c)p1 is the selectivity of statistics of the rest of column(UnitPrice),                    which is 8827/121317=0.07275979

d)cardinality is 121317

dbcc show_statistics(SalesOrderDetail,I_ProductID_UnitPrice)

dbcc show_statistics(SalesOrderDetail,I_UnitPrice)

3)Let me pass the values to the formula:  max(  min('All density of two columns',p0,p1),p0*p1)  *cardinality

max(  min(0.001321004,0.03864256,0.07275979),0.03864256*0.07275979)*121317

=max(0.001321004,0.03864256*0.07275979)*121317

=max(0.001321004,0.0028116245506624)*121317

=0.0028116245506624*121317=341.09785561, rounded down to 341.

4)Enable trace flag 2363 gives you more detail

Dbcc traceon(3604,2363)

------------trace flag 2363 output-----------------

Cardinality using multi-column statistics 0.001321 and with independence assumption 0.00281163. Picking cardinality 0.00281163

Selectivity: 0.00281163

Stats collection generated:

CStCollFilter(ID=3, CARD=341.098)

CStCollBaseTable(ID=1, CARD=121317 TBL: SalesOrderDetail)

End selectivity computation

------------trace flag 2363 output-----------------

NEW CE-2016-db_compatibility_level=130

a)The selectivity of multi single-column stats, the formula is: min('all density',p0,p1,p2,p3)

b)'All density' is the one for all columns  in  multi-column statistics

c)p0,p1,p2,p3 are the selectivity of each value of the column in WHERE clause and P0<p1<p2<p3.

d)If the multi-columns has more than 4 columns, it only counts the first 4, the rest of them are ignored.

Note, if trace flag 4199 is enabled, this rule is as same as the rule in DB_compatibility_level 140/150

1.New CE with compatibility_level 130

select * from SalesOrderDetail where ProductID=870 and UnitPrice=4.99 option(recompile)

1)The estimated row is 160.

2)Here is the estimated row formula:   min('all density',p0,p1,p2,p3)  *cardinality.

a)All density of two columns is :0.001321004

b)p0 is the selectivity of statistics of one column(ProductID) in where clause, which is 4688/121317=0.03864256

c)p1 is the selectivity of statistics of the rest of column(UnitPrice),                    which is 8827/121317=0.07275979

d)cardinality is 121317

dbcc show_statistics(SalesOrderDetail,I_ProductID_UnitPrice)

dbcc show_statistics(SalesOrderDetail,I_UnitPrice)

3)Let me pass the values to the formula:   min('All density of two columns',p0,p1)  *cardinality

MIN(0.001321004,0.03864256,0.07275979)*121317=0.001321004*121317=160.260242268, is rounded down to 160.

4)Enable trace flag 2363 gives you more detail

Dbcc traceon(3604,2363,)

------------trace flag 2363 output-----------------

Selectivity: 0.001321

Stats collection generated:

CStCollFilter(ID=3, CARD=160.26)

CStCollBaseTable(ID=1, CARD=121317 TBL: SalesOrderDetail)

End selectivity computation

------------trace flag 2363 output-----------------

NEW CE-2014-db_compatibility_level=120

SQL 2014 is not aware of Multi-columns stats, the algorithm is as same as Multiple single column statistics.formula is: p0 * p1^(1/2) * p2^(1/4)* p3^(1/8)

Note, if trace flag 4199 is enabled, this rule is as same as the rule in DB_compatibility_level 140/150

1.New CE with compatibility_level 120

select * from SalesOrderDetail where ProductID=870 and UnitPrice=4.99 option(recompile)

1)The estimated row is 12650.

2)Here is the estimated row formula:   p0 * p1^(1/2) *cardinality.

a)All density of two columns is :0.001321004

b)p0 is the selectivity of statistics of one column(ProductID) in where clause, which is 4688/121317=0.03864256

c)p1 is the selectivity of statistics of the rest of column(UnitPrice),                    which is 8827/121317=0.07275979

d)cardinality is 121317

dbcc show_statistics(SalesOrderDetail,I_ProductID_UnitPrice)

dbcc show_statistics(SalesOrderDetail,I_UnitPrice)

3)Let me pass the values to the formula:    p0 * p1^(1/2)  *cardinality

0.03864256*0.07275979^(1/2)*121317=0.0104235*121317=1264.547750, is rounded up to 1265.

4)Enable trace flag 2363 gives you more detail

Dbcc traceon(3604,2363,)

------------trace flag 2363 output-----------------

Selectivity: 0.0104235

Stats collection generated:

CStCollFilter(ID=2, CARD=1264.54)

CStCollBaseTable(ID=1, CARD=121317 TBL: SalesOrderDetail)

End selectivity computation

------------trace flag 2363 output-----------------

Co-Authors
Version history
Last update:
‎Nov 01 2022 11:18 AM
Updated by: