First published on MSDN on Nov 08, 2017
As Internet of Things (IoT) devices and sensors are becoming more ubiquitous in consumer, business and industrial landscapes, they introduce a unique challenge in terms of the volume of data they produce, and the velocity with which they produce it. The challenge is to ingest and analyze this data at the speed at which it is being generated, in real-time. Azure SQL Database with In-Memory OLTP and Columnstore technologies is phenomenal at ingesting large volumes of data from many different sources at the same time, while providing the ability to analyze current and historical data in real-time.
The following sample demonstrates the high scale and performance of SQL Database, with the ability to insert 1.4 million rows per second by using a
memory-optimized table to speed up data ingestion, while managing the In-Memory OLTP storage footprint by offloading historical data to a disk-based Columnstore table for real time analytics. One of the customers already leveraging Azure SQL Database for their entire IoT solution is
Quorum International Inc.
, who was able to double their key database’s workload while lowering their DTU consumption by 70%.
Sample release and source code:
High Level Architecture
Ingesting Data: IoT sensor data loader
A multi-threaded data loader application (running on a Standard DS15 v2 Azure VM) is used to generate sensor readings that are inserted directly into the dbo.MeterMeasurement
memory optimized table through the
natively compiled stored procedure (that accepts a memory optimized table valued parameter). Also, the application is responsible for off-loading historical data to a disk based Columnstore table to manage the In-Memory storage footprint. Below is a screenshot of the data simulator inserting
1.4M rows per second
by using a single P15 Premium (4000 DTUs) Azure SQL Database.
Off Loading Data: Bulk load historical data into a clustered Columnstore index
Historical data is offloaded from the In-Memory table to a disk based Columnstore table to manage the In-Memory storage footprint. The
stored procedure is called by multiple threads asynchronously to offload historical data into a clustered Columnstore disk-based table. Note that if the batch size is >= 102,400, the rows are loaded directly into the compressed rowgroups, thus it is always recommended that you choose a batch size >=102,400 for efficient bulk import because you can avoid moving data rows to a delta rowgroups before the rows are eventually moved to compressed rowgroups by a background thread, Tuple mover (TM). Please refer to the
Columnstore indexes - data loading guidance
documentation for further details.
Below is a sample output from
during the data load. Note that off loaded data in Columnstore lands directly in compressed state achieving up to
10x query performance gains
over traditional row-oriented storage, and up to
10x data compression
over the uncompressed data size.
Durability Options for the Memory-Optimized table
Based on your workload characteristics and the nature of your application you can choose between the following two durability options for the memory optimized table that handles the ingestion:
(durability=schema_only) which achieves greater ingestion rates as there is no IO involved since it doesn’t have on-disk representation, thus not offering data persistence.
(durability=schema_and_data) which provides both schema and data persistence, but lower ingestion rates primarily due to logging.
In this sample, the memory-optimized table:
is created with
DURABILITY = SCHEMA_ONLY
, meaning that in a case of a SQL Server restart or a reconfiguration occurs in the Azure SQL Database, the table schema persists, but the data in the table (that have not yet offloaded to disk based columnstore) is lost. To learn more about the two durability options for memory optimized tables we recommend you to read:
Defining Durability for Memory-Optimized Objects
Metrics: CPU, Log, Memory, and xtp_storage from sys.dm_db_resource_stats
Below is a snapshot of CPU, Log, Memory, and xtp_storage metrics from
after a 25-minute run. The
metric, which is the storage utilization for In-Memory OLTP in percentage of the limit of the service tier never exceeds 7% on this run (that uses a
memory optimized table), meaning that the offloading to Columnstore is keeping up with the data ingestion of 1.4M rows per second. The overall memory is consistently below 40%, the CPU below 60%, and the log write percent reaches 100% during the offloading.