Aggregation WITH ROLLUP
Published Mar 23 2019 05:08 AM 899 Views
Microsoft
First published on MSDN on Sep 21, 2007

In this post, I'm going to discuss how aggregation WITH ROLLUP works.  The WITH ROLLUP clause permits us to execute multiple "levels" of aggregation in a single statement.  For example, suppose we have the following fictitious sales data.  (This is the same data that I used for my series of posts on the PIVOT operator.)



CREATE TABLE Sales (EmpId INT, Yr INT, Sales MONEY)
INSERT Sales VALUES(1, 2005, 12000)
INSERT Sales VALUES(1, 2006, 18000)
INSERT Sales VALUES(1, 2007, 25000)
INSERT Sales VALUES(2, 2005, 15000)
INSERT Sales VALUES(2, 2006, 6000)
INSERT Sales VALUES(3, 2006, 20000)
INSERT Sales VALUES(3, 2007, 24000)


We can write a simple aggregation query to compute the total sales by year:



SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr


As expected, this query returns three rows - one for each year:

Yr Sales
----------- ---------------------
2005 27000.00
2006 44000.00
2007 49000.00

The query plan is a simple stream aggregate :


|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
|--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1010]=COUNT_BIG([Sales].[Sales]), [Expr1011]=SUM([Sales].[Sales])))
|--Sort(ORDER BY:([Sales].[Yr] ASC))
|--Table Scan(OBJECT:([Sales]))


Now suppose that we want to compute not just the sale by year but the total sales as well.  We could write a UNION ALL query:



SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr
UNION ALL
SELECT NULL, SUM(Sales) AS Sales
FROM Sales


This query works and does give the correct result:

Yr Sales
----------- ---------------------
2005 27000.00
2006 44000.00
2007 49000.00
NULL 120000.00

However, the query plan performs two scans and two aggregations (one to compute the sales by year and one to compute the total sales):


|--Concatenation
|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1023]=(0) THEN NULL ELSE [Expr1024] END))
|    |--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1023]=COUNT_BIG([Sales].[Sales]), [Expr1024]=SUM([Sales].[Sales])))
|         |--Sort(ORDER BY:([Sales].[Yr] ASC))
|              |--Table Scan(OBJECT:([Sales]))
|--Compute Scalar(DEFINE:([Expr1010]=NULL))
|--Compute Scalar(DEFINE:([Expr1009]=CASE WHEN [Expr1025]=(0) THEN NULL ELSE [Expr1026] END))
|--Stream Aggregate(DEFINE:([Expr1025]=COUNT_BIG([Sales].[Sales]), [Expr1026]=SUM([Sales].[Sales])))
|--Table Scan(OBJECT:([Sales]))


We can do better by adding a WITH ROLLUP clause to the original query:



SELECT Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr WITH ROLLUP


This query is simpler to write and uses a more efficient query plan with only a single scan:


|--Compute Scalar(DEFINE:([Expr1004]=CASE WHEN [Expr1005]=(0) THEN NULL ELSE [Expr1006] END))
|--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1005]=SUM([Expr1007]), [Expr1006]=SUM([Expr1008])))
|--Stream Aggregate(GROUP BY:([Sales].[Yr]) DEFINE:([Expr1007]=COUNT_BIG([Sales].[Sales]), [Expr1008]=SUM([Sales].[Sales])))
|--Sort(ORDER BY:([Sales].[Yr] ASC))
|--Table Scan(OBJECT:([Sales]))


The bottom stream aggregate in this query plan is the same as the stream aggregate in the original non-ROLLUP query.  This aggregation is a normal aggregation and, as such, it can be implemented using a stream aggregate (as in this example) or a hash aggregate (try adding an OPTION (HASH GROUP) clause to the above query).  It can also be parallelized.


The top stream aggregate is a special aggregate that computes the ROLLUP.  (Unfortunately, in SQL Server 2005 there is no way to discern from the query plan that this aggregate implements a ROLLUP.  This issue will be fixed in SQL Server 2008 graphical and XML plans.)  A ROLLUP aggregate is always implemented using stream aggregate and cannot be parallelized.  In this simple example, the ROLLUP stream aggregate merely returns each pre-aggregated input row while maintaining a running total of the Sales column.  After outputting the final input row, the aggregate also returns one additional row with the final sum.  Since SQL lacks a concept of an ALL value, the Yr column is set to NULL for this final row.  If NULL is valid value for Yr, we can identify the ROLLUP row using the GROUPING(Yr) construct.



SELECT
CASE WHEN GROUPING(Yr) = 0
THEN CAST (Yr AS CHAR(5))
ELSE 'ALL'
END AS Yr,
SUM(Sales) AS Sales
FROM Sales
GROUP BY Yr WITH ROLLUP

Yr Sales
----- ---------------------
2005 27000.00
2006 44000.00
2007 49000.00
ALL 120000.00

We can also compute multiple ROLLUP levels in a single query.  For example, suppose that we want to compute the sales first by employee and then for each employee by year:



SELECT EmpId, Yr, SUM(Sales) AS Sales
FROM Sales
GROUP BY EmpId, Yr WITH ROLLUP

EmpId Yr Sales
----------- ----------- ---------------------
1 2005 12000.00
1 2006 18000.00
1 2007 25000.00
1 NULL 55000.00
2 2005 15000.00
2 2006 6000.00
2 NULL 21000.00
3 2006 20000.00
3 2007 24000.00
3 NULL 44000.00
NULL NULL 120000.00

There are a couple of points worth noting about this query.  First, since the combination of the EmpId and Yr columns is unique, in the absence of the WITH ROLLUP clause, this query would just return the original data.  However, with the WITH ROLLUP clause the query produces a useful result.  Second, the order of the columns in the GROUP BY clause is relevant with the WITH ROLLUP clause.  To see why simply try the same query but reverse the EmpId and Yr columns.  Instead of computing the sales first by employee it will compute the sales first by year.


The query plan for this query is identical to the query plan for the prior query except that it groups on both the EmpId and Yr columns instead of on just the EmpId column.  Like the prior query plan, this query plan includes two stream aggregates: the bottom one which is a normal stream aggregate and the top one which computes the ROLLUP.  This ROLLUP stream aggregate actually computes two running totals: one which computes the total sales for an employee for all years and one which compute the total sales for all employees and all years.  This table shows how the ROLLUP computation proceeds:




EmpId


Yr


SUM(Sales) BY EmpId, Yr


SUM(Sales) BY EmpId


SUM(Sales)


1


2005


12000.00


12000.00


12000.00


1


2006


18000.00


30000.00


30000.00


1


2007


25000.00


55000.00


55000.00


1


NULL



55000.00


55000.00


2


2005


15000.00


15000.00


70000.00


2


2006


6000.00


21000.00


76000.00


2


NULL



21000.00


76000.00


3


2006


20000.00


20000.00


96000.00


3


2007


24000.00


44000.00


120000.00


3


NULL



44000.00


120000.00


NULL


NULL




120000.00


In my next post, I'll take a look at the WITH CUBE clause.  I'll discuss how it differs from WITH ROLLUP both in terms of function and in terms of its implementation.

Version history
Last update:
‎Mar 23 2019 05:08 AM
Updated by: