End-to-End IoT analytics and machine learning with Azure Data and AI services
Published May 20 2021 07:39 AM 7,239 Views
Microsoft

The Internet of Things (IoT) technology stack in Operational Technology (OT) is widely used across various industries, including oil & gas, manufacturing, utilities, and natural resources, for solving operational challenges and delivering mission-critical insights and analytics.

 

More and more organizations are leveraging Microsoft's Azure cloud platform to perform large scale analytics and machine learning using data from IoT assets, something that had not been simple to do using traditional systems such as Scada and Historians.

 

In this article, we preview an end-to-end Azure Data and AI cloud architecture that enables IoT analytics. This article is based on our 3-part blog series on the Databricks Blog site. You can find more information and code samples starting with

 

Here is the overall architecture discussed in this article and the Databricks blog series:

 

teaserpicture.PNG

For making this article applicable on a common IoT use case, consider the scenario of balancing optimal short-term utilization of an asset, such as a wind turbine, against its long-term maintenance costs.

balance.PNG

In order to develop insights on short-term optimization as well as longer term maintenance costs, various data sources need to be considered and ingested into the cloud for centralized storage and analysis. Here are a few Azure cloud services to consider, depending on whether the data sources can be stream or batch processed.

 

For the wind turbine scenario, streaming data can be the sensor data collected from the turbines, while structured data can be maintenance and failure data collected in a batch process.

 

Once the data sources are ingested into Azure, there are a few options, again depending on stream or batch processing, to process and store the data. In this architecture, the Delta format in Azure Databricks backed by Azure Data Lake Gen 2 is the preferred data format for large-scale IoT data sources: Delta Lake and Delta Engine guide - Azure Databricks - Workspace | Microsoft Docs

 

Once the data is ingested, processed and stored into Delta format, Azure Databricks can be used for big data analytics including data engineering and data science using Spark. As a common pattern, multiple zones within the data lake and aggregations are highlighted below identified through

 

  • Bronze for raw granular IoT data
  • Silver for aggregated data, commonly used for machine learning and data science
  • Gold for enriched data ready for analytics and reporting purposes

Data engineers can use Azure Databricks and create 3 Delta tables corresponding to these three zones. Users can use programming languages namely Python, Scala, R, and SQL in Azure Databricks for accelerated data engineering and data science development.

 

Azure Machine Learning can be used for machine learning, most commonly together with Azure Databricks, in this IoT architecture. For example, Azure Databricks can be used with Spark to engineer features and aggregate data. Then Azure Machine Learning can be used to build models through code, drag-and-drop, or even automated machine learning. In addition, Azure Machine Learning can be used to deploy and operationalize machine learning models.

 

For the wind turbine scenario, the bronze Delta table could be the granular IoT sensor data from the turbines while the silver Delta table is the aggregated (by the hour for example) data. Then Azure Databricks can be used to perform feature engineering and feature selection to build a machine learning and analytics ready dataset. This dataset would then be loaded in Azure Machine Learning to build a predictive maintenance model or a power generation prediction model.

 

Finally, once the predictions and gold enriched data is created in the gold Delta table with Azure Databricks, it can be loaded into Azure Synapse Analytics for BI analytics and reporting scenarios together with Power BI. Azure Data Explorer provides real-time operational analytics so IoT data can be streamed directly from IoT Hub or Event Hub to Data Explorer.

 

In summary, this article covered the end-to-end steps for enabling IoT data analytics and machine learning on the Azure cloud platform, including some best practices, recommended services, and application with wind turbine operations use case. This blog series has the full details and provides code samples as well: Articles by Hubert Duan - The Databricks Blog.

 

 

Co-Authors
Version history
Last update:
‎May 19 2021 07:44 AM
Updated by: