The Internet of Things (IoT) technology stack in Operational Technology (OT) is widely used across various industries, including oil & gas, manufacturing, utilities, and natural resources, for solving operational challenges and delivering mission-critical insights and analytics.
More and more organizations are leveraging Microsoft's Azure cloud platform to perform large scale analytics and machine learning using data from IoT assets, something that had not been simple to do using traditional systems such as Scada and Historians.
In this article, we preview an end-to-end Azure Data and AI cloud architecture that enables IoT analytics. This article is based on our 3-part blog series on the Databricks Blog site. You can find more information and code samples starting with
Here is the overall architecture discussed in this article and the Databricks blog series:
For making this article applicable on a common IoT use case, consider the scenario of balancing optimal short-term utilization of an asset, such as a wind turbine, against its long-term maintenance costs.
In order to develop insights on short-term optimization as well as longer term maintenance costs, various data sources need to be considered and ingested into the cloud for centralized storage and analysis. Here are a few Azure cloud services to consider, depending on whether the data sources can be stream or batch processed.
Alternatively, Data Factory can be used to move batch data to the data lake as well
Once the data is ingested, processed and stored into Delta format, Azure Databricks can be used for big data analytics including data engineering and data science using Spark. As a common pattern, multiple zones within the data lake and aggregations are highlighted below identified through
Bronze for raw granular IoT data
Silver for aggregated data, commonly used for machine learning and data science
Gold for enriched data ready for analytics and reporting purposes
Data engineers can use Azure Databricks and create 3 Delta tables corresponding to these three zones. Users can use programming languages namely Python, Scala, R, and SQL in Azure Databricks for accelerated data engineering and data science development.
Azure Machine Learning can be used for machine learning, most commonly together with Azure Databricks, in this IoT architecture. For example, Azure Databricks can be used with Spark to engineer features and aggregate data. Then Azure Machine Learning can be used to build models through code, drag-and-drop, or even automated machine learning. In addition, Azure Machine Learning can be used to deploy and operationalize machine learning models.
For the wind turbine scenario, the bronze Delta table could be the granular IoT sensor data from the turbines while the silver Delta table is the aggregated (by the hour for example) data. Then Azure Databricks can be used to perform feature engineering and feature selection to build a machine learning and analytics ready dataset. This dataset would then be loaded in Azure Machine Learning to build a predictive maintenance model or a power generation prediction model.
Finally, once the predictions and gold enriched data is created in the gold Delta table with Azure Databricks, it can be loaded into Azure Synapse Analytics for BI analytics and reporting scenarios together with Power BI. Azure Data Explorer provides real-time operational analytics so IoT data can be streamed directly from IoT Hub or Event Hub to Data Explorer.
In summary, this article covered the end-to-end steps for enabling IoT data analytics and machine learning on the Azure cloud platform, including some best practices, recommended services, and application with wind turbine operations use case. This blog series has the full details and provides code samples as well: Articles by Hubert Duan - The Databricks Blog.