Using ML.NET to estimate water consumption from acceleration measures
Published Nov 14 2022 08:36 AM 2,135 Views
Microsoft

Machine learning (ML) is everywhere. You use ML-empowered applications without even noticing it: when clicking on product recommendations on an e-commerce website, or when consulting your navigation app to choose the best itinerary to reach a new place, for example.

You, as a developer, cannot certainly ignore this evolving tendency. And if you are familiar with .NET framework, ML.NET could be an easy win! In fact, ML.NET is a framework designed for .NET developers that wish to add machine learning capabilities to their applications, without having to leave the .NET ecosystem and by writing code in the languages they are already familiar with. 

In this blog post, I'll show you how I used this framework to build a regression model able to predict how much water a user has drunk from or refilled into a glass, by leveraging on the acceleration measures of the glass itself. This solution is part of the project developed for .NET StudentZone Conf 2022.

The raw data of this solution has been collected by IoT sensors of weight and acceleration, installed on a glass. The weight sensor measures are expressed in grams, while the acceleration is measured on 3 axis - x, y and z.

carlottacaste_0-1668182946914.png

For training purposes, raw data has been aggregated using an action window, where an action is the user drinking X grams of water or the user refilling the glass with Y grams of water. The figure below shows the first few records of the final dataset.

carlottacaste_0-1668182807785.png

 

All the code, written in C# as a Polyglot Notebook (previously known as .NET Interactive Notebook), and executable on the cloud - using GitHub Codespaces - or locally - using Visual Studio Code,  is available on the .Net Student Zone GitHub Repository.

The selection of the features to use is the result of a preliminary exploratory data analysis, performed by using Microsoft.Data.Analysis library backed up by some data visualization with Plotly.NET. The label value is a weight delta, with a positive delta corresponding to an action of water refill and a negative delta corresponding to an action of drinking water.

 

carlottacaste_1-1668182807790.png

 

By plotting the correlation between the label (weight delta) and some dataset variables, what I observe is:

  • The measurements' timestamps are concentrated in a limited amount of time - brief slots in a limited range of days – so they can hardly help the model extract a pattern of drinking habits. 
  • There's a positive correlation between window duration and weight delta; in particular in terms of the amount of water refilled, the higher is the quantity of water refilled, the higher the duration of the action in seconds. 
  • The average acceleration distribution for water refill and water consumption is clearly distinguishable, with no overlaps. The figure above shows only acceleration on x axis, but similar reasoning can be applied to the other 2 axis.

This is the type of analysis which helps to exclude some columns from the training dataset (like Time and ActionId, which is just a random identifier of the action) and validate the others as potentially good features for weight predictions. In addition to this, other pre-processing steps before starting a machine learning experiment are:

  • Converting window duration into float, to use it as a numerical feature;
  • Filtering rows with an absolute weight delta lower than 1 gr, since it's plausible enough that they are bad recordings or bad measurements and do not correspond to any action of drinking or refilling;
  • Splitting the dataset into 2 subsets - a training dataset and a test dataset - to hold the smallest one back for validation.

At this point, I am ready to build our training pipeline with Automated machine learning (AutoML). AutoML automates the process of applying machine learning to data. Given a dataset and simple input, it is able to output the best trainer and the best hyperparameters setup for a specific scenario.

In this use case the input I pass to my AutoML experiment are:

  • The training dataset resulting from previous transformations;
  • The type of algorithm, which is regression, since I want to predict a numeric value;
  • The metric to use to identify the best model, which is RSquared, one of the most common evaluation metric for regression problems.

To see and understand the results of my experiment you can watch my session below.

Finally, if you are curious about where the IoT data comes from or how the trained model of my experiment is consumed in a MAUI application, you can explore the other sessions of the .Net Student Conf and understand how all the different pieces and different technologies of .NET ecosystem are combined together into a single e-2-e project.

 

Co-Authors
Version history
Last update:
‎Nov 22 2022 08:50 AM
Updated by: