General availability: serverless for Hyperscale in Azure SQL Database

Morgan_Oslake · ‎Feb 13 2024

Optimizing resource allocation to achieve performance goals while controlling costs can be a challenging balance to strike especially for database workloads with complex usage patterns. Azure SQL Database serverless provides a solution to help address these challenges, but until now the general availability of serverless has only been available in the General Purpose tier. However, many workloads that can benefit from serverless may require greater performance and scale along with other capabilities unique to the Hyperscale tier.

We are pleased to announce the general availability of serverless auto-scaling for Hyperscale in Azure SQL Database. The benefits of serverless and Hyperscale now come together into a single database solution.

Combined benefits

Performance and scaling

Automatic scaling of compute up to 80 vcores and 240 GB memory per replica.
Automatic scaling of local SSD cache up to 720 GB per replica boosting IO performance.
Automatic scaling of database storage up to 100 TB.
Auto-scaling independence of the primary replica, high availability replicas, and named replicas.
Auto-scaling independence of CPU and memory to match workload demand.
Higher IO performance than the General Purpose tier.
Read scale-out up to 30 named replicas.

Business continuity

High availability configuration flexibility.
Fast restore speed in minutes independent of data size.

Price optimization

Price-performance optimized memory management.
Billing for compute based on the amount used per second.
Billing for storage based on the amount used per hour.
Future release: automatic pausing & resuming, and billing only for storage when paused.

Serverless Hyperscale optimizes price-performance and simplifies performance management for databases with intermittent and unpredictable usage while leaving headroom for growing to extreme scale and delivering high performance during active usage periods. Serverless Hyperscale is also well-suited for new applications with compute sizing uncertainty or workloads requiring frequent rescaling in order to reduce costs. The serverless compute tier for Hyperscale helps accelerate application development, minimize operational complexity, and lower total costs.

Compute scaling and billing

Serverless Hyperscale automatically scales compute and cache resources in response to changing workload demand of the database. The primary replica and any high availability (HA) replicas or named replicas each auto-scale independently for a serverless database in Hyperscale. This auto-scaling independence adapts to workload variability that can occur across replicas based on their application purpose and optimizes resource allocation accordingly across the entire Hyperscale footprint. The compute billed for each replica is based on the amount of CPU and memory used per second between a configurable compute range.

Serverless Hyperscale contrasts with provisioned compute Hyperscale which allocates a fixed amount of compute resources for a fixed price and is billed per hour. Over short time scales, provisioned compute databases must either over-provision resources at a cost in order to accommodate peak usage or under-provision and risk poor performance. Over longer time scales, provisioned compute databases can be rescaled, but this solution may require predicting usage patterns or writing custom logic to trigger rescaling operations based on a schedule or performance metrics which adds development and operational complexity. In serverless, compute scaling within configurable limits is managed natively by the service to continuously and quickly right-size resources.

For example, consider an IoT application using a serverless Hyperscale database with sporadic usage that requires multi-core bursting headroom throughout the day. Suppose the primary replica with one HA replica is configured to allow auto-scaling up to 16 vcores with the following usage pattern over a 1 hour period:

Primary replica

Further suppose the database is scaled out with one named replica for read-only analytics scenarios and also configured to allow auto-scaling up to 16 vcores with the following usage pattern over the same 1 hour period:

Named replica

As can be seen, database usage corresponds to the amount of compute billed. The compute billed is measured in units of vcore seconds and over the 1 hour period sums to around 9k vcore seconds for the primary replica, 7k vcore seconds for the HA replica, and 9k vcore seconds for the named replica. Suppose the serverless compute unit price for each replica in the East US region is around $0.000105/vcore-second (subject to change). Then the compute cost over this 1 hour period is approximately $1.67 for the primary replica and its HA replica plus $0.94 for the named replica. These costs are calculated by multiplying the compute unit price by the total number of vcore seconds billed for each replica. During this time period, the compute for the primary and named replicas independently scaled from idle usage up to nearly 100 percent of 16 vcores in response to unpredictable bursting episodes and without any customer intervention. In this example, the price savings using serverless is significant compared to using provisioned compute Hyperscale configured with the same 16 vcore limit.

Price-performance trade-offs

Price-performance trade-offs to consider when using serverless for Hyperscale are related to the compute unit price and impact on performance due to compute warm-up after periods of low or idle usage.

Compute unit price

The serverless compute unit price is higher than for provisioned compute within Hyperscale since serverless is optimized for workloads with intermittent usage patterns. If CPU or memory usage is high enough and sustained for long enough, then the provisioned compute tier may be less expensive.

Compute warm-up

The SQL memory cache and resilient buffer pool extension (RBPEX) local disk cache for each compute replica in serverless Hyperscale are gradually reclaimed if CPU or memory usage are low enough for long enough. When workload activity returns, disk IO may be required to rehydrate data pages into the SQL buffer pool memory cache or RBPEX disk cache, or query plans may need to be recompiled. This cache reclamation policy based on low usage is unique to serverless and done to control costs, but can impact performance. Cache reclamation based on low usage does not occur in the provisioned compute tier where this kind of impact can be avoided.

Learn more

Azure SQL Database serverless auto-scaling is now generally available for the Hyperscale tier. Serverless auto-pausing and resuming in Hyperscale is planned in a future release.

Azure SQL Database serverless and Hyperscale.
Azure SQL Database pricing for serverless Hyperscale.

Products (50)

Special Topics (27)

Video Hub (462)

Most Active Hubs