Announcing General Availability of Approximate Percentile Functions for Azure SQL DB and MI

Microsoft

Mar 01, 2023

Today, we are announcing General Availability (GA) of native implementation of APPROX_PERCENTILE in Azure SQL Database and Azure SQL Managed Instance. We announced preview of these functions in October 2022. Since then, many customers have adopted these for the applications where response time of percentile calculation was more important than the accuracy of the result.

These aggregate functions (APPROX_PERCENTILE_CONT & APPROX_PERCENTILE_DISC) need lesser memory and lesser CPU as compared to their precise analytic counterparts (PERCENTILE_CONT & PERCENTILE_DISC). Here is a quick example to demonstrate this.

To follow along, you would need to use WideWorldImportersDW sample database and increase the number of rows using below script.

https://github.com/microsoft/bobsql/blob/master/sql2019book/ch2_intelligent_performance/iqp/extendwwidw.sql

For this demo, we have imported WideWorldImportersDW-Full.bacpac to a Azure SQL Database – Business Critical tier and made it bigger by using script mentioned above. Let's start with looking at the 95th percentile for order total (including tax) by employee using PERCENTILE_CONT

DBCC DROPCLEANBUFFERS
GO
SET STATISTICS TIME,IO ON
GO
SELECT DISTINCT de.Employee
	, PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY [Total Including Tax]) OVER (PARTITION BY de.Employee) AS [95 percentile CONT] 
FROM Fact.OrderHistory AS foh
	INNER JOIN Dimension.Employee AS de ON foh.[Salesperson Key] = de.[Employee Key]
ORDER BY de.Employee;
GO
SET STATISTICS TIME,IO OFF

Here is the output (only partial shown to save some electrons).

Statistics IO

Table 'Employee'. Scan count 1, logical reads 5, physical reads 1, page server reads 0, read-ahead reads 10, page server
Table 'OrderHistory'. Scan count 9, logical reads 101000, physical reads 0, page server reads 0, read-ahead reads 100937
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server 
Table 'Worktable'. Scan count 44, logical reads 21189783, physical reads 0, page server reads 0, read-ahead reads 19097,
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server

Statistics time

SQL Server Execution Times:
   CPU time = 50469 ms,  elapsed time = 12969 ms.

Now let's look at how to achieve the same thing with APPROX_PERCENTILE

DBCC DROPCLEANBUFFERS
GO
SET STATISTICS TIME,IO ON
GO
SELECT de.Employee
	, APPROX_PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY [Total Including Tax] ) AS [95 percentile APPROX_CONT] 
FROM Fact.OrderHistory AS foh
	INNER JOIN Dimension.Employee AS de ON foh.[Salesperson Key] = de.[Employee Key]
GROUP BY de.Employee
ORDER BY de.Employee;
GO
SET STATISTICS TIME,IO OFF

Statistics IO

Table 'OrderHistory'. Scan count 9, logical reads 101000, physical reads 0, page server reads 0, read-ahead reads 100937, page serve
Table 'Employee'. Scan count 0, logical reads 202, physical reads 3, page server reads 0, read-ahead reads 0, page server read-ahead
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 101, page server read-ahea
Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead r
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, page server reads 0, read-ahead reads 0, page server read-ahead

Statistics time

SQL Server Execution Times:
   CPU time = 11096 ms,  elapsed time = 1583 ms.

If we compare the elapsed time (in Statistics time output), it is ~1.5 seconds vs. ~13 seconds. IO also has significantly reduced using approximate function (in Statistics IO output). Since each query plan has parallelism, we are seeing more CPU time than elapsed time.

Here is the memory used by percentile (taken from XML plan)

<MemoryGrantInfo SerialRequiredMemory="3072" SerialDesiredMemory="383784" RequiredMemory="27080" DesiredMemory="407816" RequestedMemory="407816" GrantWaitTime="0" GrantedMemory="407816" MaxUsedMemory="204872" MaxQueryMemory="6471920" />

and from approx_percentile query plan

<MemoryGrantInfo SerialRequiredMemory="1536" SerialDesiredMemory="2944" RequiredMemory="13512" DesiredMemory="14920" RequestedMemory="14920" GrantWaitTime="0" GrantedMemory="14920" MaxUsedMemory="3640" MaxQueryMemory="6471920" />

Above snippets show low memory requirement of approx_percentile family of function.

Here are the links to documentation for approximate percentile functions to learn more.

Updated Mar 01, 2023

Version 1.0

azure sql

azure sql database

azure sql managed instance

sql server

sql server on azure vms

Balmukund-Lakhani

Microsoft

Joined October 02, 2017

View Profile

Azure SQL Blog

Follow this blog board to get notified when there's new activity

5 Comments

Ronen_Ariely
MVP
Mar 20, 2023
Hi Francesco_beqom

Just tini clarification...

I am pretty sure that you meant "Accuracy" which is what Balmukund-Lakhani answered, and not "Precision" even so the question regarding Precision is legit as well.

Accuracy refers to how close a measurement is to the true or accepted value. Precision refers to how close measurements of the same item are to each other. Precision is independent of accuracy but accuracy is depended by precision. A bad precision always means low accuracy but not vice versa.

https://en.wikipedia.org/wiki/Accuracy_and_precision
Balmukund-Lakhani
Microsoft
Mar 20, 2023
Hi Francesco_beqom

Thanks for the comment.

We have documented that these functions provide rank-based error guarantees not value based. The function implementation guarantees up to a 1.33% error. So basically, if you arrange them in order then rank wise, result might be off max by 1.33%.
Francesco_beqom
Brass Contributor
Mar 20, 2023
"[...] where response time of percentile calculation was more important than the accuracy of the result."

Now I'm glad you show the statistics about how fast this implementation is.
However I would have loved an example of the precision that we are going to lose shifting to APPROX_PERCENTILE_CONT.

Any idea?
Balmukund-Lakhani
Microsoft
Mar 13, 2023
I have used Business Critical tier in my demo so didn't notice error. Thanks for sharing your experience.
Ronen_Ariely
MVP
Mar 07, 2023
Note! Attempting to import the bacpac to Azure SQL DB or to SQL Server 2022 returns errors.

Update! I found the issue.

this bacpac is for Azure SQL Databases in the Premium tier only.

This information we can find here: https://github.com/Microsoft/sql-server-samples/releases/tag/wide-world-importers-v1.0

Blog Post

Announcing General Availability of Approximate Percentile Functions for Azure SQL DB and MI