Azure SQL DB Design Note: Choosing the Right Database Reservation Size
Published Sep 24 2019 10:05 AM 9,710 Views

Azure SQL Database originally started off with very simple options – you could pick between two database sizes (1GB and 5GB).   This was a packaged version of SQL Server with a number of features built-in (backups, high-availability), but it did not have the degrees of flexibility you see in on-premises environments where you could pick any hardware you wanted on which to run your database. Over the years, Azure SQL Database has evolved at a seemingly increasing rate with a plethora of new purchase models and options. This blog post outlines the advice the SQL team gives to customers when choosing what tier to use so customers can understand how to pick the right reservation size to best address the needs of a given application.

 

DTU vs. vCore

Azure SQL DB currently supports two different purchasing models. The earlier/older model, called DTU (Database Throughput Units), is an abstraction where pricing is abstracted fully from the underlying hardware. It would qualify service tiers based on approximate throughput of a concurrent OLTP benchmark. For example, a customer might buy 100 DTU which would translate to roughly 100 transactions per second for a given reservation size. Internally, these would roughly map to fractions of a machine that a customer would pay to be provisioned continuously. More critically, the fractions of the machine were generally fixed based on the ratios of whatever the underlying hardware could do. If 100 DTU internally represented about 1/8th of a machine, as an example, you would get 1/8th of the CPU cores, 1/8th of the memory, 1/8th of the storage, 1/8th of the IOPS, 1/8th of the log transaction commit rate, 1/8th of the tempdb space and IOPS, etc. Since the ratio is fixed across each resource dimension, a customer would need to provision enough capacity to handle whatever the highest dimension would be. This differs a bit from the model used in SQL Server where you buy the licenses to run cores and potentially provision a machine with more RAM or flash-based SSD for tempdb if specific resource dimensions were more needy for a given workload.

The second purchasing model is newer and is called the “vCore” model. This model was created based on customer feedback on DTUs. The vCore model attempts to surface the underlying hardware more directly (so you can see the generation of CPUs being used) as well as giving customers the ability to purchase IO and storage separately from CPU and memory. This means that if an application workload is more IO intensive but less CPU intensive, as an example, then it might be cheaper under the vCore model because you can buy fewer provisioned CPU cores than you would need under a DTU-based pricing model.

 

Service Tier (Basic/Standard/Premium for DTU, General Purpose, Business Critical, Hyperscale for vCore)

DTU and vCore have somewhat different storage model offerings, and it’s useful to understand a bit about how Azure SQL DB is architected to understand which choice is best for a given application. Please note that the Azure SQL Database team is continuously doing work to optimize and improve the service under the covers, so some of the details may change over time.

Storage Model

Implementation

“Remote SSD” Storage

Azure Storage hosted files attached to a virtual machine hosting a SQL instance (exact storage performance tier varies based on Azure SQL DB reservation size).  Latency is slower than local SSD.

“Local SSD” Storage

Local SSD-based files for the database, log, and tempdb running on the same virtual machine as the Azure SQL DB instance. Additionally, 2-3 other replicas exist for high availability (also with local SSD storage). Database size is limited in size to 4TB based on the underlying hardware hosting the virtual machine.

Hyperscale

Hybrid storage model where parts of the database storage engine is split up and hosted on different nodes.   Some data is hosted on remote storage and cached on page servers using memory + Local SSD-based storage. This design allows for databases much larger than 4TB.

 

The DTU model abstracts the choice of internal storage model from you (but you can surmise from the performance choices for Basic vs. Standard vs. Premium that you don’t need Local SSD storage for Basic and you likely do need it to make Premium work. The Standard ones are split with some of the lower-end models using remote storage and the higher-end models using local storage). The vCore model makes this choice more visible to you. The “general purpose” tier maps explicitly to remote storage, while the “business critical” model maps to local storage.

Customers familiar with SQL Server in on-premises environments should think of the “local storage” option as being similar to running an AlwaysOn Availability Group with multiple local replicas and the highest guarantees on uptime in case of a failure or failover. The “general purpose” model is closer to running a single virtual machine over a SAN where Azure SQL DB will provision another virtual machine from free capacity when a machine dies or is upgraded. The Hyperscale model does not have a direct analogue in the on-premises world of SQL Server but is conceptually in-between and is somewhat akin to a scale-out version of the Buffer Pool Extensions (BPE) feature.

Azure SQL Database can run on Availability Zones as well. In the local storage model, this means that the replicas are spread across multiple data centers instead of all being hosted in a single building. In the remote storage model, it means that Azure Storage can provide similar protections for files to be accessible across multiple zones in case one is unavailable.

You can read more about the DTU model here: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tiers-dtu. More information about the vCore model is here: https://docs.microsoft.com/en-us/azure/sql-database/sql-database-service-tiers-vcore

 

Single Database vs. Elastic Pools

The normal provisioning model for Azure SQL Database provides guaranteed resources that are available all the time (24hours/day x 7 days/week). Each database is given those resources whether they need them or not. Some customers, however, have usage patterns where the total resource requirements are spread across many databases instead of one. For example, an ISV hosted on Azure SQL Database may have one database per customer and know that not all customers will ever be active at once. In this model, it makes more sense to try to commit many databases to the same resources to save money. In Azure SQL Database, this is called the elastic pool model. Note that you can move between resource sizes dynamically in Azure SQL Database, so scaling up and down is done mostly transparently to your application (including going between singleton models and elastic pools).

 

Serverless vs. Provisioned Compute Tier

Until recently, Azure SQL Database only supported a fully provisioned model (for both DTU and vCore purchasing models). As of the time of this writing, a “serverless” purchasing model is also available where the customer database can go into a paused state when not in use. This model is ideal for workloads that are not busy all the time. For example, if there is a test environment that is only used once a week to test code before it is shipped into a production environment, the test environment may be cheaper to run using the serverless option. The pricing for serverless is somewhat higher than provisioned when active (to account for having to keep resources ready to immediately start a workload at the time of login) but much lower when paused (priced at the cost of storage). This gives customers more options to control how they get billed based on the resource characteristics of their workload. The serverless option is currently in public preview.

 

Choosing the Right One

There are a lot of choices here, and it can be a bit overwhelming.   However, in many cases it’s not hard to pick the right model for your database with a few key inputs. This table outlines the main recommendations we give to customers.

Choice

Recommendation

Explanation

DTU vs. vCore

vCore is the right choice for most new applications while existing applications can stay on the DTU model if they prefer. Low resource/small databases can choose between vCore Serverless or Basic/Standard DTU model if that works better than Serverless on vCore

The vCore model provides the most flexibility for tuning CPU/Memory vs. Storage/IOPS as well as choosing the specific hardware generation on which to run an application.   Additionally, the Serverless (preview) option gives pricing flexibility for occasional workloads.   The DTU model exists for cases where customers do not need this flexibility or where resource needs are small enough to fit into Basic/Standard and are busy enough to not benefit from Serverless.

Compute Tier

Serverless vs. Provisioned

The provisioned model is ideal for 24/7 production applications/services. Non-production systems can leverage provisioned (perhaps in lower reservation sizes) or serverless [note: serverless is still in preview]

Storage Tier

Critical production databases should only run on Local SSDs (generally Premium/Business Critical).   They have the best performance and the highest availability SLA. Less critical or lower resource databases can run on general purpose (examples: test systems, non-customer facing systems). Databases larger than 4TB should use Hyperscale.

Databases needing the highest availability and protection against the most failure modes should use Premium/Business Critical. This model allows the database to function with the least number of dependencies on other systems and protects against, for example, the outage of a storage account.  

 

Hyperscale supports databases greater than what can be hosted on a single machine (currently 4TB) and is the only option to host larger databases.

Managed Instance vs. Non-Managed Instance

Applications migrating from SQL Server that require MI-specific features (CLR, SQL Agent, etc,) should use MI.

The starting price point for MI is higher than the other options (4 vCores), but it has lower friction for migrating existing SQL Server databases.

Elastic Pools

Customers with multiple databases that can share resources should use Elastic Pools.

ISVs or systems hosting many databases where the collective usage patterns are known can often save money compared to the per-database provisioning model by allocating resources for the peak load of the set of databases instead of each one separately.  

 

Within each tier, there are also various sizing options available for customers. Customers do often ask how to size the database in Azure. While it is critical to size databases correctly in the on-premises world (as you buy the hardware, usually), in Azure SQL Database you can resize dynamically in an online fashion. Therefore, it is usually good enough to pick a somewhat larger size initially and then resize down once you have the workload running in the cloud. That said, it is still good to review available material to help make informed choices.

The current limits for each resource governed in Azure SQL Database are published here:

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-dtu-resource-limits-single-database...

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-vcore-resource-limits-single-databa...

Also, the Database Migration Assistant can help you get a sizing estimation for an existing on-premises workload. You can read more about that here:

https://docs.microsoft.com/en-us/sql/dma/dma-overview?view=sql-server-2017

 

The Availability SLAs also differ for the options in Azure SQL Database. The current SLA is listed here:

https://azure.microsoft.com/en-us/support/legal/sla/sql-database/v1_4/

When using any infrastructure, it is important to understand the potential for failure of each component on which you take a dependency. Every system, from light switches to telephones, have some reasoned failure rate at which the system does not promise to work. This is usually measured in the number of 9s, and it is often more expensive to get more 9s. So, a 99.9% available solution is rated to have up to 43 minutes of downtime per month, while a 99.99% available solution may have up to 4-5 minutes of downtime per month. These numbers are not exact – it is often the case that the uptime for databases is better than the stated availability SLA.

One key area where we regularly discuss with customers is how to determine what SLA is needed for a given application. When taking dependencies on multiple components for a solution (for example, DNS + Azure SQL Database + Azure Storage) , it is also important to understand that they don’t all necessarily fail at the same time even if they have the same SLA. So, planning for failure is a key aspect to designing resilient systems. While a proper discussion of how to architect data applications to maximize uptime is a much broader conversation, it is possible to reason about how to maximize uptime for a single database based on the choices you make when you pick a database compute/service offering. If your database is critical to your application, it is probably better to pick the premium/business-critical service tiers for your system. While this is more expensive, the internal architecture (local storage) used in these models will better isolate your database from more modes of failures than you can see in the general-purpose (remote storage) tiers. For example, if a storage account has a problem and goes offline, the locally-hosted databases will remain up while a remote storage database hosted on that storage account will not be available until the storage account becomes available again. While these situations are rare, they do happen and it is important to understand and plan for these when choosing the right options for your Azure SQL Database. Azure SQL DB does have mechanisms in place to help restore a database to a functioning state in case of an extended outage (to the same region or even a different region in case a whole region is unavailable) such as Geo-Restore and Active Geo-Replication. However, the business-critical databases are designed to have fewer cases when you would need to consider that for a given database. Therefore, if a database is truly critical to your workload, we recommend that you keep it on the business-critical service tier since this minimizes potential downtime.

The Availability SLA we publish reflects this difference in the internal failure mode analysis. Azure SQL Database provides 4 and a half nines (99.995% uptime SLA) for databases using the business critical tier and Availability Zones (which spread the replicas across multiple datacenter buildings to minimize the chance of a network outage impacting your database availability). Therefore, we recommend customers use Availability Zones and the Business Critical tier for any database where outages need to be strictly minimized.

The following links provide more information how availability zones work:

https://docs.microsoft.com/en-us/azure/availability-zones/az-overview

https://azure.microsoft.com/en-us/global-infrastructure/availability-zones/

This link explains how the internal architecture of Azure SQL Database provides high availability with local replicas in Availability Zones:

https://docs.microsoft.com/en-us/azure/sql-database/sql-database-high-availability

(Note that you can also utilize Active Geo-Replication for disaster recovery in different regions in Azure SQL Database, and we recommend that any production database have a disaster recovery plan as well).

Version history
Last update:
‎Sep 24 2019 05:52 PM
Updated by: