Approximate query processing was introduced to enable operations across large data sets where responsiveness is more critical than absolute precision. Approximate operations can be used effectively for scenarios such as KPI and telemetry dashboards, data science exploration, anomaly detection, and big data analysis and visualization. Approximate query processing family has enabled a new market of big data HTAP customer scenarios, including fast-performing dashboard and data science exploration requirements.
Today, we are announcing preview of native implementation of APPROX_PERCENTILE in Azure SQL Database and Azure SQL Managed Instance. This function will calculate the approximated value at a provided percentile from a distribution of numeric values.
Imagine a table of temperature sensor data which has millions of rows, and you want to find 99th percentile quickly with acceptable rank-based error bound to take quick decisions. These functions can give output in a single pass with less CPU and memory usage. Here is the quick demonstration to show that (we have used order_line table from TPCC database).
Exact Percentile calculation for 50th percentile ol_amount for warehouses (ID between 1 to 500)
SELECT DISTINCT ol_w_id
,PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ol_amount)
OVER (PARTITION BY ol_w_id) AS MedianCont
,PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY ol_amount)
OVER (PARTITION BY ol_w_id) AS MedianDisc
FROM order_line
WHERE ol_w_id BETWEEN 1 AND 500
ORDER BY ol_w_id
Memory grant information from XML plan.
<QueryPlan DegreeOfParallelism="8" MemoryGrant="33230744" CachedPlanSize="120" CompileTime="8" CompileCPU="8" CompileMemory="1008">
Same calculation using new approximate percentile functions.
SELECT ol_w_id,
APPROX_PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ol_amount) AS MedianCont,
APPROX_PERCENTILE_DISC(0.5) WITHIN GROUP (ORDER BY ol_amount) AS MedianDisc
FROM order_line
WHERE ol_w_id BETWEEN 1 AND 500
GROUP BY ol_w_id
ORDER BY ol_w_id
Memory grant information from XML plan.
<QueryPlan DegreeOfParallelism="8" MemoryGrant="50688" CachedPlanSize="56" CompileTime="4" CompileCPU="4" CompileMemory="464">
As we can see the memory grant for approximate version is significantly less as compared to exact version.
Further references:
Link to documentation for approximate percentile functions
- APPROX_PERCENTILE_CONT (Transact-SQL) - SQL Server | Microsoft Learn
- APPROX_PERCENTILE_DISC (Transact-SQL) - SQL Server | Microsoft Learn