Automatic compute scaling with serverless for Hyperscale in Azure SQL Database
Published Feb 14 2023 02:59 PM 11.6K Views

Optimizing resource allocation to achieve performance goals while controlling costs can be a challenging balance to strike especially for database workloads with complex usage patterns.  Azure SQL Database serverless provides a solution to help address these challenges, but until now serverless has only been available in the General Purpose tier.  However, many workloads that can benefit from serverless may require greater performance and scale along with other capabilities unique to the Hyperscale tier.


We are pleased to announce the preview of serverless for Hyperscale in Azure SQL Database.  The benefits of serverless and Hyperscale now come together into a single database solution.


Combined benefits

Performance and scaling

  • Automatic scaling of compute up to 80 vcores and 240 GB memory per replica
  • Automatic scaling of local disk cache up to 720 GB per replica boosting IO performance
  • Automatic scaling of database storage up to 100 TB
  • Auto-scaling independence of the primary replica, high availability replicas, and named replicas
  • Auto-scaling independence of CPU and memory to match workload demand
  • Higher IO performance than the General Purpose tier
  • Read scale-out up to 30 named replicas

Business continuity

  • High availability configuration flexibility
  • Fast restore speed in minutes independent of data size

Price optimization

  • Price-performance optimized memory management
  • Billing for compute based on the amount used per second
  • Billing for storage based on the amount used per hour
  • Future release: automatic pausing & resuming, and billing only for storage when paused

Serverless Hyperscale (preview) optimizes price-performance and simplifies performance management for databases with intermittent and unpredictable usage while leaving headroom for growing to extreme scale and delivering high performance during active usage periods.  Serverless Hyperscale is also well-suited for new applications with compute sizing uncertainty or workloads requiring frequent rescaling in order to reduce costs.  The serverless compute tier for Hyperscale helps accelerate application development, minimize operational complexity, and lower total costs.


Compute scaling and billing

Serverless Hyperscale automatically scales compute and disk cache resources in response to changing workload demand of the database.  The primary replica and any high availability (HA) replicas or named replicas each auto-scale independently for a serverless database in Hyperscale.  This auto-scaling independence adapts to workload variability that can occur across replicas based on their application purpose and optimizes resource allocation accordingly across the entire Hyperscale footprint.  The compute billed for each replica is based on the amount of CPU and memory used per second between a configurable compute range.


Serverless Hyperscale contrasts with provisioned compute Hyperscale which allocates a fixed amount of compute resources for a fixed price and is billed per hour.  Over short time scales, provisioned compute databases must either over-provision resources at a cost in order to accommodate peak usage or under-provision and risk poor performance. Over longer time scales, provisioned compute databases can be rescaled, but this solution may require predicting usage patterns or writing custom logic to trigger rescaling operations based on a schedule or performance metrics which adds development and operational complexity.  In serverless, compute scaling within configurable limits is managed natively by the service to continuously and quickly right-size resources.


For example, consider an IoT application using a serverless Hyperscale database with sporadic usage that requires multi-core bursting headroom throughout the day.  Suppose the primary replica is configured to allow auto-scaling up to 16 vcores and has the following usage pattern over a 1 hour period:


Primary replica



Further suppose the database is scaled out with one named replica for read-only analytics scenarios and also configured to allow auto-scaling up to 16 vcores with the following usage pattern over the same 1 hour period:


Named replica



As can be seen, database usage corresponds to the amount of compute billed which is measured in units of vcore seconds and sums to around 13k vcore seconds for the primary replica and around 11k vcore seconds for the named replica over the 1 hour period.  Suppose the serverless compute unit price is around $0.000163/vcore-second for the primary replica and around $0.000105/vcore-second for the named replica. Then the compute cost over this 1 hour period is approximately $2.12 for the primary replica and $1.15 for the named replica.  These costs are calculated by multiplying the compute unit price by the total number of vcore seconds billed for each replica.  During this time period, the compute for each replica independently scaled from idle usage up to nearly 100 percent of 16 vcores in response to unpredictable bursting episodes and without any customer intervention.  In this example, the price savings using serverless is significant compared to using provisioned compute Hyperscale configured with the same 16 vcore limit.


In this example, preview pricing is based on the East US region in February 2023 and subject to change.


Price-performance trade-offs

Price-performance trade-offs to consider with serverless Hyperscale are related to the compute unit price and impact on performance due to compute warm-up after periods of low or idle usage.


Compute unit price

The serverless compute unit price is higher than for provisioned compute within Hyperscale since serverless is optimized for workloads with intermittent usage patterns.  If CPU or memory usage is high enough and sustained for long enough, then the provisioned compute tier may be less expensive.


Compute warm-up

The SQL memory cache and resilient buffer pool extension (RBPEX) local disk cache for each compute replica in serverless Hyperscale are gradually reclaimed if CPU or memory usage are low enough for long enough.  When workload activity returns, disk IO may be required to rehydrate data pages into the SQL buffer pool memory cache or RBPEX disk cache, or query plans may need to be recompiled. This cache reclamation policy based on low usage is unique to serverless and done to control costs, but can impact performance. Cache reclamation based on low usage does not occur in the provisioned compute tier where this kind of impact can be avoided.


Learn more

Azure SQL Database serverless auto-scaling is now in preview for the Hyperscale tier.  Serverless auto-pausing and resuming in Hyperscale is planned in a future release.

Version history
Last update:
‎Feb 16 2023 01:16 AM
Updated by: