How to get the biggest bang for your buck with SQL Server on Azure VMs
Published Mar 29 2021 09:00 AM 17.1K Views

Running SQL Server on an Azure virtual machine is one of the quickest and easiest ways to lift-and-shift your SQL workloads into the cloud. The great thing about SQL Server on Azure Virtual Machines is that it’s the same SQL Server database engine you’re used to running on-premises. For the most part, configuration, settings, maintenance, and performance tuning techniques are all going to be the same, but infrastructure configuration such as choosing the right VM size and designing a storage layout are going to be quite different. The choices you make when creating and configuring a SQL Server VM in Azure can have a dramatic impact on performance and cost, potentially improving performance by 35% or more while decreasing the cost. No one knows SQL Server like Microsoft, so it should be no surprise that Azure offers the best overall value for running your SQL Server workloads in the cloud.


There are many benefits to running your SQL Server workloads on Azure VMs:

  • SQL IaaS Agent Extension adds an array of manageability features for your SQL Server VMs at no additional cost.
  • Azure Hybrid Benefit allows you leverage your existing on-premises Software Assurance investment in Azure including the value-add of free HA/DR instances for business continuity.
  • Security Center helps keeps your SQL Server secure with Azure Defender for SQL.
  • Competitive price-performance for your modern SQL Server transactional workload, as demonstrated in our recent GigaOm performance benchmark.

So how do you get the best price-performance possible when configuring your SQL Server on Azure VM? In this blog, we’re going to cover three key aspects to right-sizing (and right-configuring) your Azure VM for SQL Server that are based on some common pitfalls customers face when migrating their on-premises workloads to Azure VM:

  • Choosing the best VM series and size for your workload
  • Configuring storage for maximum throughput and lower cost
  • Leveraging unique to Azure features such as host caching to boost performance at no additional cost

To help illustrate how small choices can have a big impact on price-performance, we conducted an internal experiment against two different VMs from our Memory Optimized series that are commonly used to host SQL Server workloads. Here are some specifications for the two VMs:



* This cost is for compute only with a 3-year reservation, assuming Azure Hybrid Benefit for both Windows OS and SQL Server licensing costs.


Note that these two VMs are very similar – they both have the same core count, RAM, and I/O throughput, but the E64-32ds_v4 VM has the addition of a fast local SSD that can be used for both temporary (ephemeral) storage within the VM and for host caching, also known as Azure BlobCache. Host caching is a great way to boost your performance with no additional cost, provided that you choose a VM size that supports it. This is particularly useful for SQL Server workloads which tend to be I/O intensive and read-heavy. In this case, the VM that supports caching is slightly more expensive, but as you’ll see in a moment, a little goes a LONG way when it comes to I/O throughput.


Speaking of I/O, how about storage configuration? You have a number of storage choices for Azure VM, but for SQL Server workloads we recommend managed disk, either Premium SSD or Azure ultra disk. Ultra disk provides high throughput at very low latency, but most SQL Server customers find Premium SSD meets their needs if it’s configured correctly. We generally recommend either P30 or P40 drives as these support host caching and provide reservations which can help with overall cost.


Coming back to our experiment, let’s assume that we have measured our on-prem workload and we estimate that we’ll need about 3 GBps of I/O throughput. You can’t get this from a single managed disk, so we’ll need to create a storage pool so that we can combine the throughput of multiple Premium SSDs. There are a few different ways we can do this, but generally a large number of small disks gives better price-performance than a small number of large disks. Here’s a comparison, both of these configurations assume 1-year reservation:






Note that the second configuration has a lot more drives, but the throughput is about the same. The capacity is lower, but we’re sizing for throughput here, not capacity. The price difference is pretty staggering. If you do need more capacity, you can go with larger drives – P40s would give you 32 TiB at $3,937.33/month and still support caching.

For our experiment, 16 TiB was more than enough storage, but we don’t want a single pool because we want to be able to configure the pools separately for data and log files. We created two pools as follows:





Note the cache settings. Writes in SQL Server are flagged as writethrough which means they always write directly to storage to guarantee durability. This means writes can’t take advantage of host caching. Since the transaction log is mostly writes, there’s no benefit to host caching there, and in fact enabling it for log drives can cause unnecessary overhead.


Data files on the other hand are read-heavy, even for most OLTP systems. Reads in SQL Server can take advantage of host caching, and we find that for many customers around 30% of data file reads will get satisfied from the cache, which frees up remote storage bandwidth for writes and other I/O. Remember that turning on host caching is free, so if you’re not configuring your storage and VM to take advantage of it, you’re leaving money (and I/O throughput) on the table!


OK, so let’s get to the fun part. We used HammerDB to execute a TPC-C1 style workload against both of our VM configurations. Normally for SQL Server benchmarks we use TPC-E because it’s a more realistic workload, but HammerDB is a common tool used by many database professionals to stress and benchmark systems, so we thought it would be a good choice for this informal experiment in case you want to run something like this yourself. Check out the results. The performance metric used for these tests is New Orders Per Minute (NOPM):



 * This cost is for compute only with a 3-year reservation, assuming Azure Hybrid Benefit for both Windows OS and SQL Server licensing costs.

Just by following the best practices we outlined above, we were able to get 34% MORE performance at 50% of the cost!


These are pretty amazing results, but we’re doing a benchmark-style test here, can you really expect these results in your production system with your workload? The answer is yes, and maybe even better than what we see here. The TPC-C workload is designed to stress a system hard, particularly in I/O throughput. Because of this, it’s heavily weighted in favor of writes, about 2 to 1 reads/writes or 67% read / 33% write. Most SQL Server workloads, even busy OLTP workloads, are closer to 80%-90% read / 10%-20% write, so you may see an even bigger benefit from read caching, particularly if you can downsize to a lower-cost VM with a smaller memory footprint.


So whether you’re already running SQL Server workloads in Azure VMs, are in the process of migrating your on-prem workloads to Azure, or just considering your options, be sure to check out the following resources to help you get the best performance for your cloud dollar in Azure:

Checklist: Performance best practices & guidelines - SQL Server on Azure VM | Microsoft Docs

Deep Dive: Azure SQL Virtual Machine Sizing | Data Exposed Live - YouTube

Eight ways to optimize costs on Azure SQL | Azure Blog and Updates | Microsoft Azure


1The HammerDB TPC-C workload is derived from the TPC-C Benchmark and as such is not comparable to published TPC-C Benchmark results, as the HammerDB TPC-C workload results do not fully comply with the TPC-C Benchmark.
















Version history
Last update:
‎Nov 10 2021 01:58 PM
Updated by: