Time Series Forecasting in ML.NET and Notebooks in Azure ML Studio
In this sample, learn how to run time series forecasting in a Jupyter notebook. We will read in data from a csv file, do some exploratory plots, fit a regression model, and fit a more sophisticated Singular Spectrum Analysis (SSA) forecaster.
Create a symlink between the installed location of dotnet interactive and your local bin directory: sudo ln -s /home/azureuser/.dotnet/tools/dotnet-interactive /usr/local/bin/dotnet-interactive
Set your dotnet root directory: export DOTNET_ROOT=$(dirname $(realpath $(which dotnet)))
Install the jupyter kernel: dotnet interactive jupyter install
Verify the installation by running jupyter kernelspec list. You should see ".net-fsharp" and ".net-csharp" listed as kernels.
Install Mkl on Ubuntu Linux
If you are running ML.NET for the first time on an Ubuntu Linux machine (like Azure Machine Learning notebooks), please follow these instructions to download the required dependencies.
Start visualizing data
Great! We’re now set up to run ML.NET in Azure ML Integrated Notebooks. Let’s begin by visualizing our data, using the XPlot library. Notice how the data display a sinusoidal pattern, but there’s also a good amount of noise.
Compute an engineered feature
As we mentioned, the data display a sinusoidal pattern, so let’s use that intuition to fit a regression model with an engineered feature. Specifically, let’s fit a model using a cosine function as our independent variable. Below, consider how well a cosine model can mimic the periodicity of our original model. The only things that are wrong are the distance between crests and troughs of each wave (the “amplitude”) and the y-intercept of the wave. Luckily, linear regression can give us these values.
Fit a linear regression model
Let’s try fitting a model using our engineered features from the previous step. Because the input data are so nicely sinusoidal, this model actually works quite well. It has a Mean Absolute Error (MAE) of 1.997 and a Root Mean Squared Error (RMSE) of 2.574. Let’s see if we can do better.
Use ML.NET’s SSA Forecasting Transformer
ML.NET’s SSAForecastingTransformer can fit a forecasting model on our original data, without our having to provide it with engineered features. Most of the required parameters are based on the amount of data you have and the amount of time in the future you expect to predict. The only tricky one is the “windowSize” parameter, which should be set to be twice the length of the maximum expected seasonality in the data. For example, if you have data that is collected once per day in an environment that shows both monthly and yearly seasonality, you should set windowSize to be twice the length of the year, or 730. See the example notebook for more details on the other parameters.
Notice that the SSA Forecasting Transformer gives us not only a lower MAE and RMSE of 1.963 and 2.491, respectively, but also gives us 95% confidence bounds.
Predict future values
So we’ve found our model of interest, now let’s use it to predict the future! We can simply retrain the model on all of the data, and then use CreateTimeSeriesEngine to get a predictor, and then call Predict() to predict points up to the horizon we specified during training.
In this notebook, you learned how to do time series forecasting in ML.NET with Jupyter notebooks. We initially used linear regression with an engineered feature, but we were able to improve performance by relying on ML.NET's SSA forecaster.