Introduction
We have been chosen as winners at Climate Hackathon 2022 competition organized by Microsoft. The aim of this competition was to find new solutions to prevent climate change by utilizing new technologies. We entered the competition with a solution that we had already started designing and working on, but this hackathon gave us some needed urgency to finalize it. Going forward, we are ready to continue turning the proposed solution into a marketable product, that can help other companies improve their environmental sustainability.
The Problem
The competition had three distinct challenges, from which teams could choose one to solve. We chose the challenge number two by Inditex, that was targeted to calculating precise energy and water consumption at manufacturing facilities.
For some time, we have been planning a carbon footprint calculating system for swimming halls. In the process we have learned that this topic contains more than monitoring water and electricity use. For example, mobility of repair and maintenance personnel and logistics between swimming halls should be considered. Mobility and logistics are especially important since the vehicle fleet in Finland mainly uses diesel and petroleum as fuel at the moment. In the future, situation might be different since Finnish Government is actively promoting the use of electric vehicles in transportation. In addition to the mentioned factors for swimming hall carbon footprint, other consumable goods (e.g., filters) and associated carbon footprint should be considered.
Fundamental idea for this solution was to build versatile and easy-to-use system to manage carbon footprint. There are plenty of Excel-based solutions already in the markets, but we wanted to build a centralized solution that is integrable to different facilities. This type of system enables holistic gathering of all the relevant data to produce as precise calculations as possible. All the data cannot be gathered directly from machinery (e.g., with IoT technology), thus manual data entry must be supported.
The Solution
Our technical solution is based on Microsoft Azure services and resources. All the resources are in one Azure subscription and two different resource groups. Common resources are combined to one resource group - like logging, IoT processing etc. - and application resource group combines resources used by web application.
Users use the application with computer or mobile web browsers. They authenticate against Azure Active Directory and authorization is based on roles which link to users AD security groups. This information is retrieved to web application by using Microsoft Graph API.
There are four main parts to our solution:
- Web application is main user interface for inputting and maintaining data. Web application is based on App Service, which has ASP.NET core backend with API interface and React frontend. UI framework is done with Material UI. Web application uses database for storing states and other information of users and so on. It also communicates with data warehouse for manual data inputs or batch spreadsheet uploads. Web application uses Azure Cache for increasing performance in database queries.
- Databases - application database and data warehouse are Azure SQL databases which connect with external table technique and remote processing stored procedures. Data warehouse uses two different schemas, one for staging and one for production entities.
- Data ingestion is based on Stream Analytics and IoT Hub which is connected to device hub in the manufacturing facility. IoT devices feed data to IoT hub from different locations. Stream Analytics processes feeds and fill gaps, when necessary, by using machine learning model. The model is in Stream Analytics' job script.
- Analyzing views are based on two types of views. Power BI reports in Embedded mode which integrates reports in the web application and map view inside the web application, which has geospatial analytics features, like routing, travel time, and travel by carbon emission amount. Power BI reports are stored in the Power BI service, and they are served from there. This also requires Azure Power BI Embedding service where we can control the capacity running these reports. Power BI SDK is used for integration in the web application. Map view's required routing engine is in the separate virtual machine which also runs the web service. It fetches geographical data from Open Street Maps service.
Azure DevOps pipelines are used for deployment of Azure resources as infrastructure-as-code method. Source control with code validation is also used in there. Azure Key Vaults are used for storing web application SSL certificate along with other secrets and settings. All centralized logging of resource use to Azure Log Analytics and Power BI reports are used for monitoring this information. Microsoft Defender for Cloud is used for security scanning and auditing classified database columns.
Algorithms
Online Inference with Stream Analytics
We used Azure Stream Analytics for retrieving device data from IoT Hub, running SQL analyses, and writing the output to Azure SQL Database. Our implemented decision tree model was used for SQL based inference jobs. This model is in SQL format, and it has been trained separately in different environment with the training device data and then deployed here. With this approach, we can run advanced analyses directly to data stream or after the data has been written to database.
These analyzing processes are used in multiple places in the solution, like filling gaps in the device data time series, classifying events, and making predictions for device maintenance. The benefits of using this method are accuracy, model explainability (Explainable AI), lightness, and versatility. It also works nicely with small amount of data. The following chapter describes the training process and deployment.
XGBoost to SQL
We use gradient boosting decision tree framework called XGBoost that supports GPU utilization for massive parallel processing. There are many other machine learning or deep learning frameworks which have various time series processing algorithms. For example, filling gaps in time series can be done by using LSTM model, but it will require a lot of data and it is not explainable in the model itself. A simpler option would be using some smoothing algorithms, but then the accuracy is not as good, especially when there are longer gaps.
We have implemented tool for converting XGBoost models to SQL so that we can keep the accuracy, but we have simpler and light-speed fast algorithm. We run training by using R or python language training scripts. Also, NVIDIA Rapids has its own version of this model, and it is excellent option for large datasets.
After the model is ready, we convert it from decision tree to SQL script, which can be wrapped inside SQL View or Stored Procedure, or use it as is. The SQL model contains many nested CASE-WHEN-THEN statements. Deployment of the model can be automated by using Azure DevOps pipeline or just updating manually by copy-pasting.
Geospatial Analyzing
For analyzing carbon emissions from delivery trucks and maintenance staff transportations, we used routing calculation solution in the map visualization. When we need to find reachability in certain carbon emission amount, we need to calculate every path with given cost from vehicle and fuel type. Cost limits the length of routes. We used Open Street Map as data source with open-source routing framework, which uses Dijkstra method combined calculated hierarchical 2D grid on the map. This function makes calculation against in-memory graph by using optimized native C code and returns area in GeoJSON format which can be visualized on the map as a layer.
This processing is served on separate virtual machine, which can handle requests through web services from the Azure App Service. It is essential to have this working fast, because the analyzing in the map is interactive and system must response immediately.
Further Development
We will continue improving the system and our plans is to publish in Microsoft Azure Marketplace in this year.
Team
We are a group of three people from Vantaa, Finland. In 2021 we established a company, Startecon, which focuses on IT consulting and software development. Our backgrounds are in IT consulting, software development, software architecture, and research on logistics sustainability. We are looking to offer IT solutions to other companies which are light, easy to use and scalable. We also aim to use efficient methods in project management to carry out our solutions. Please see our website for further information and queries.
Links used in the text
- Climate Hackathon: https://climatehackathon.devpost.com/
- Macsimum: https://macsimum.no/
- Microsoft: https://www.microsoft.com/
- Inditex: https://www.inditex.com/
- Explainable AI: https://en.wikipedia.org/wiki/Explainable_artificial_intelligence
- XGBoost: https://xgboost.readthedocs.io/en/stable/
- LSTM: https://en.wikipedia.org/wiki/Long_short-term_memory
- NVIDIA Rapids: https://rapids.ai/
- Dijkstra: https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
- Startecon: https://startecon.fi/