Azure SQL Database: Ingesting 1.4 million sustained rows per second with In-Memory OLTP & Columnstore Index

Microsoft

Mar 24, 2019

First published on MSDN on Nov 08, 2017
As Internet of Things (IoT) devices and sensors are becoming more ubiquitous in consumer, business and industrial landscapes, they introduce a unique challenge in terms of the volume of data they produce, and the velocity with which they produce it. The challenge is to ingest and analyze this data at the speed at which it is being generated, in real-time. Azure SQL Database with In-Memory OLTP and Columnstore technologies is phenomenal at ingesting large volumes of data from many different sources at the same time, while providing the ability to analyze current and historical data in real-time.

The following sample demonstrates the high scale and performance of SQL Database, with the ability to insert 1.4 million rows per second by using a non-durable memory-optimized table to speed up data ingestion, while managing the In-Memory OLTP storage footprint by offloading historical data to a disk-based Columnstore table for real time analytics. One of the customers already leveraging Azure SQL Database for their entire IoT solution is Quorum International Inc. , who was able to double their key database’s workload while lowering their DTU consumption by 70%.

Sample release and source code: Release | source code
High Level Architecture

Ingesting Data: IoT sensor data loader
A multi-threaded data loader application (running on a Standard DS15 v2 Azure VM) is used to generate sensor readings that are inserted directly into the dbo.MeterMeasurement schema-only memory optimized table through the dbo.InsertMeterMeasurement natively compiled stored procedure (that accepts a memory optimized table valued parameter). Also, the application is responsible for off-loading historical data to a disk based Columnstore table to manage the In-Memory storage footprint. Below is a screenshot of the data simulator inserting 1.4M rows per second by using a single P15 Premium (4000 DTUs) Azure SQL Database.

Off Loading Data: Bulk load historical data into a clustered Columnstore index
Historical data is offloaded from the In-Memory table to a disk based Columnstore table to manage the In-Memory storage footprint. The dbo.InsertMeterMeasurementHistory stored procedure is called by multiple threads asynchronously to offload historical data into a clustered Columnstore disk-based table. Note that if the batch size is >= 102,400, the rows are loaded directly into the compressed rowgroups, thus it is always recommended that you choose a batch size >=102,400 for efficient bulk import because you can avoid moving data rows to a delta rowgroups before the rows are eventually moved to compressed rowgroups by a background thread, Tuple mover (TM). Please refer to the Columnstore indexes - data loading guidance documentation for further details.

Below is a sample output from sys.dm_db_column_store_row_group_physical_stats during the data load. Note that off loaded data in Columnstore lands directly in compressed state achieving up to 10x query performance gains over traditional row-oriented storage, and up to 10x data compression over the uncompressed data size.

Durability Options for the Memory-Optimized table
Based on your workload characteristics and the nature of your application you can choose between the following two durability options for the memory optimized table that handles the ingestion:

• Non-durable (durability=schema_only) which achieves greater ingestion rates as there is no IO involved since it doesn’t have on-disk representation, thus not offering data persistence.
• Durable (durability=schema_and_data) which provides both schema and data persistence, but lower ingestion rates primarily due to logging.

In this sample, the memory-optimized table: dbo.InsertMeterMeasurement is created with DURABILITY = SCHEMA_ONLY , meaning that in a case of a SQL Server restart or a reconfiguration occurs in the Azure SQL Database, the table schema persists, but the data in the table (that have not yet offloaded to disk based columnstore) is lost. To learn more about the two durability options for memory optimized tables we recommend you to read: Defining Durability for Memory-Optimized Objects .
Metrics: CPU, Log, Memory, and xtp_storage from sys.dm_db_resource_stats
Below is a snapshot of CPU, Log, Memory, and xtp_storage metrics from sys.dm_db_resource_stats after a 25-minute run. The xtp_storage_percent metric, which is the storage utilization for In-Memory OLTP in percentage of the limit of the service tier never exceeds 7% on this run (that uses a non-durable memory optimized table), meaning that the offloading to Columnstore is keeping up with the data ingestion of 1.4M rows per second. The overall memory is consistently below 40%, the CPU below 60%, and the log write percent reaches 100% during the offloading.