Blog Post

Azure SQL Blog
7 MIN READ

Scaling up an IoT workload using an M-series Azure SQL database

Dimitri_Furman's avatar
Jan 13, 2020

Executive summary

  • M-series provides close to 30 GB of memory per core and up to 128 cores, enabling significant scale-up capability in Azure SQL Database.
  • For the same IoT workload, we achieved 3 times higher throughput on M-series compared to P15, at sustained 4.2 million rows per second ingested on average.

New hardware options in Azure SQL Database

At the Ignite conference in November 2019, we introduced the preview of two new hardware options in Azure SQL Database, in addition to the current widely adopted Gen4 and Gen5 hardware. It is now possible to run SQL Server workloads on Azure M-series and Fsv2-series virtual machines within the Azure SQL Database PaaS service. This blog will closely examine the M-series database SKU, and present a scenario showcasing this new hardware option.

 

Using M-series VMs in Azure SQL Database addresses a long-standing customer request to provide a database SKU with high memory/core ratio, while continuing to offer low latency and high throughput local SSD storage and Accelerated Networking capabilities of the Business Critical service tier. M-series has close to 30 GB of memory per core, compared to 7 GB/core for Gen4 and 5.1 GB/core for Gen5. Further, with its 128 maximum cores, M-series provides the highest number of cores among all Azure SQL Database SKUs, compared to the previous maximum of 80 cores on Gen5. With these specifications, M-series databases are very well suited for the most demanding Azure SQL Database and SQL Server workloads that require vertical scalability, or a scale-up rather than a scale-out approach.

 

IoT data ingestion workload

To put this claim to test, we ran one such workload on an M-series database. Back in 2017, we published a blog post describing a typical IoT workload, where data is ingested into a database from millions of simulated electrical power grid meters. The results described were remarkable: the workload achieved sustained ingestion of 1.4 million rows per second. This high level of throughput was made possible by two state-of-the-art technologies in SQL Server. One is In-Memory OLTP, used to ingest data in memory at high concurrency and without lock and latch contention, following the shock absorber pattern. The other is columnstore indexes, used to offload and compress data from memory to disk for durability and efficient querying in analytical applications, and to free up memory for new data. In terms of hardware resources, achieving this level of throughput required a P15 database, the largest SKU Azure SQL Database had at the time.

 

As remarkable as that result was, there were certain resource limits at the database level that did not allow the workload to scale beyond 1.4 million rows per second. One was the memory quota allocated for memory-optimized tables. On a P15 database, this quota is set to 32 GB. This limited the ability of the database to handle spikes in data generation rate, or potential delays in data offloading: if the memory quota were exhausted, the workload would fail with out-of-memory errors. Another limit was on transaction log generation rate: as the original blog mentions, log rate frequently reached 100% of the 42 MB/s limit of a P15 database. Because of these limits, the database couldn’t offload data to columnstore indexes on disk any faster, and couldn’t sustain a higher ingestion rate without running out of available memory.

 

Running on M-series

How does this workload fare on an M-series database, with its quota for memory-optimized tables at 1768 GB (55x higher than on P15, with 20x higher memory/core ratio), and its log throughput rate at 264 MB/s (6.3x higher than on P15)? To find out, we ran multiple instances of the same workload generator application concurrently, loading data into a common M-series database.

 

The goal of this test was to find out the extent of vertical scalability improvement achieved simply by using more capable hardware. For that reason, we have not modified workload parameters in any way, letting each copy of the workload generator produce the same 1.4 million rows per second. We measured data ingestion rate by polling the sys.dm_db_xtp_object_stats DMV in the database at five second intervals.

 

In this configuration, we achieved 3 times higher throughput on M-series compared to P15, at sustained 4.2 million rows per second on average.

 

 

The screenshot below shows three workload generator windows stacked vertically, each generating ~1.4 million rows per second, for the total of 4.2 million rows:

 

 

What is notable is that none of the resource limits of the M-series database is approached during this test, leaving ample headroom for other concurrent workloads against the database that would likely be present in a real customer scenario:

 

 

CPU utilization is between 60% and 70% on average, average log generation rate is close to 20%, and memory utilization for memory-optimized tables is below 3% of the quota. The latter point is important, because it shows that the database has enough resources to absorb large spikes in data generation rate without running out of memory.

 

A more detailed view of database performance for the same period can be seen on a Grafana dashboard provided by a monitoring solution built on an open source monitoring stack, described in our earlier blog post:

 

 

Sustaining spikes in data generation rate

For how long can an M-series database sustain increased data generation rate before running out of memory? To find out, we disabled data offloading to disk, to have all new data accumulate in the schema-only memory-optimized table. Then, we configured the workload generator application to produce as much data per second as it could. With just one copy of workload generator running, we achieved data ingestion rate of over 5.1 million rows per second.

 

 

 

 

Obviously, since data offloading is not running, this rate cannot be sustained indefinitely. At this data ingestion rate, a P15 database runs out of memory in about a minute. However, with its 55x larger quota for memory-optimized tables, an M-series database can sustain this rate for an hour and longer, also making the system much more resilient to any slowdowns in data offloading.

The Grafana dashboard below shows database resource usage during this simulated spike in data generation, showing memory consumption by the In-Memory OLTP clerk growing to 1.3 TB, which is about 75% of the memory quota for memory-optimized tables.

 

 

Columnstore auto-merge

Since this IoT workload uses columnstore indexes for all persisted data, it is worthwhile mentioning a recent columnstore improvement. Customers who have been using columnstore indexes extensively know that to achieve optimal query performance and compression using columnstore indexes, the size of each rowgroup should be as close as possible to the maximum and optimal size of 1,048,576 rows. When loading data into a columnstore index, batch size should ideally be set to this value, and should be at least 102,400 rows to load data directly into compressed rowgroups. For details, see Columnstore indexes - Data loading guidance.

 

When the workload generator application runs a data offloading task to move data from the memory-optimized table to the clustered columnstore table, it picks a subset of data generated by an individual power grid meter. Occasionally, it can happen that the remaining number of rows for a given meter in the memory-optimized table is less than the size of the offloading batch. At the end of the test run, this results in a few hundred rowgroups being relatively small, around 100K rows in size each. This pattern is not uncommon in many data loading scenarios.

 

The way to increase rowgroup size and improve columnstore quality is to reorganize the columnstore index, as described in detail in an earlier SQLCAT blog. This merges multiple smaller rowgroups into fewer larger rowgroups, sized up to 1,048,576 rows. However, reorganizing an index is a resource-intensive operation that a customer must implement on their own.

 

In Azure SQL Database, the need for this specific type of columnstore index maintenance is much reduced, and in many cases eliminated altogether. As part of several recent improvements in columnstore indexes, the background tuple mover thread now automatically merges smaller compressed rowgroups into larger compressed rowgroups. To observe the results, we ran the following query a few hours after the load had stopped:

 

 

SELECT state_desc,
       trim_reason_desc,
       transition_to_compressed_state_desc,
       AVG(size_in_bytes) AS avg_size_in_bytes,
       AVG(total_rows) AS avg_total_rows,
       SUM(total_rows) AS total_rows,
       COUNT(1) AS rowgroup_count
FROM sys.dm_db_column_store_row_group_physical_stats
GROUP BY state_desc,
         trim_reason_desc,
         transition_to_compressed_state_desc
ORDER BY state_desc,
         trim_reason_desc,
         transition_to_compressed_state_desc;

 

 

 

The result shows that hundreds of smaller rowgroups have been auto-merged into twenty five rowgroups, each sized close to one million rows each to provide optimal compression and query performance:

 

 

Conclusion

The Azure SQL Database service is continuously improved to support a wider variety of SQL Server workloads in Azure. The recent addition of M-series as a hardware option unlocks unprecedented vertical scalability in the service. For a closer look at M-series resource limits, see documentation for single databases and elastic pools.

 

M-series has already allowed very demanding CRM and eCommerce workloads migrate to Azure SQL Database and enjoy the benefits of a PaaS service. While during preview we only supported the largest M-series SKU with 128 cores, smaller SKUs starting at 8 vCores for both single databases and elastic pools are now available with General Availability of M-series hardware. This lets a much broader range of customers choose the right sized SKU on M-series for their workloads, and reap the benefits of scalability improvements provided by the high memory/core ratio of this hardware.

Updated Nov 09, 2020
Version 9.0
No CommentsBe the first to comment