Time Series Forecasting in ML.NET and Notebooks in Azure ML Studio
In this sample, learn how to run time series forecasting in a Jupyter notebook. We will read in data from a csv file, do some exploratory plots, fit a regression model, and fit a more sophisticated Singular Spectrum Analysis (SSA) forecaster.
Download the source code
Access the GitHub repo and copy the “clone” link in order to run this tutorial on your own machine.
Install C# Kernel
Note: These instructions only apply if you intend to run this notebook in Azure Machine Learning. You can also run this notebook on your local machine by following the instructions at the dotnet interactive GitHub repo
- Go to ml.azure.com. Select your subscription and machine learning workspace.
- Open up the "Notebooks" tab on the lefthand side of the page
- Create a compute instance if you have not already, or select an existing one from the dropdown menu.
- Open a notebook file with an extension of .ipynb
- Select the Terminal button at the top right.
- Follow the instructions here to register a Microsoft product key and install .NET Core 3.1.
- Install dotnet interactive by running dotnet tool install -g --add-source "https://dotnet.myget.org/F/dotnet-try/api/v3/index.json" dotnet-interactive
- Create a symlink between the installed location of dotnet interactive and your local bin directory: sudo ln -s /home/azureuser/.dotnet/tools/dotnet-interactive /usr/local/bin/dotnet-interactive
- Set your dotnet root directory: export DOTNET_ROOT=$(dirname $(realpath $(which dotnet)))
- Install the jupyter kernel: dotnet interactive jupyter install
- Verify the installation by running jupyter kernelspec list. You should see ".net-fsharp" and ".net-csharp" listed as kernels.
Install Mkl on Ubuntu Linux
If you are running ML.NET for the first time on an Ubuntu Linux machine (like Azure Machine Learning notebooks), please follow these instructions to download the required dependencies.
Start visualizing data
Great! We’re now set up to run ML.NET in Azure ML Integrated Notebooks. Let’s begin by visualizing our data, using the XPlot library. Notice how the data display a sinusoidal pattern, but there’s also a good amount of noise.
Compute an engineered feature
As we mentioned, the data display a sinusoidal pattern, so let’s use that intuition to fit a regression model with an engineered feature. Specifically, let’s fit a model using a cosine function as our independent variable. Below, consider how well a cosine model can mimic the periodicity of our original model. The only things that are wrong are the distance between crests and troughs of each wave (the “amplitude”) and the y-intercept of the wave. Luckily, linear regression can give us these values.
Fit a linear regression model
Let’s try fitting a model using our engineered features from the previous step. Because the input data are so nicely sinusoidal, this model actually works quite well. It has a Mean Absolute Error (MAE) of 1.997 and a Root Mean Squared Error (RMSE) of 2.574. Let’s see if we can do better.
Use ML.NET’s SSA Forecasting Transformer
ML.NET’s SSAForecastingTransformer can fit a forecasting model on our original data, without our having to provide it with engineered features. Most of the required parameters are based on the amount of data you have and the amount of time in the future you expect to predict. The only tricky one is the “windowSize” parameter, which should be set to be twice the length of the maximum expected seasonality in the data. For example, if you have data that is collected once per day in an environment that shows both monthly and yearly seasonality, you should set windowSize to be twice the length of the year, or 730. See the example notebook for more details on the other parameters.
Notice that the SSA Forecasting Transformer gives us not only a lower MAE and RMSE of 1.963 and 2.491, respectively, but also gives us 95% confidence bounds.
Predict future values
So we’ve found our model of interest, now let’s use it to predict the future! We can simply retrain the model on all of the data, and then use CreateTimeSeriesEngine to get a predictor, and then call Predict() to predict points up to the horizon we specified during training.
In this notebook, you learned how to do time series forecasting in ML.NET with Jupyter notebooks. We initially used linear regression with an engineered feature, but we were able to improve performance by relying on ML.NET's SSA forecaster.
To learn more about C# and Jupyter Notebooks, check out this GitHub repo.
To see another example of using ML.NET in Jupyter, check out this blog.
To learn about using DataFrames in C#, check out this blog.
To get started with Model Builder in Visual Studio, try this getting started tutorial.