Time Series Forecasting in ML.NET and Notebooks in Azure ML Studio
In this sample, learn how to run time series forecasting in a Jupyter notebook. We will read in data from a csv file, do some exploratory plots, fit a regression model, and fit a more sophisticated Singular Spectrum Analysis (SSA) forecaster.
Access the GitHub repo and copy the “clone” link in order to run this tutorial on your own machine.
Note: These instructions only apply if you intend to run this notebook in Azure Machine Learning. You can also run this notebook on your local machine by following the instructions at the dotnet interactive GitHub repo
If you are running ML.NET for the first time on an Ubuntu Linux machine (like Azure Machine Learning notebooks), please follow these instructions to download the required dependencies.
Great! We’re now set up to run ML.NET in Azure ML Integrated Notebooks. Let’s begin by visualizing our data, using the XPlot library. Notice how the data display a sinusoidal pattern, but there’s also a good amount of noise.
As we mentioned, the data display a sinusoidal pattern, so let’s use that intuition to fit a regression model with an engineered feature. Specifically, let’s fit a model using a cosine function as our independent variable. Below, consider how well a cosine model can mimic the periodicity of our original model. The only things that are wrong are the distance between crests and troughs of each wave (the “amplitude”) and the y-intercept of the wave. Luckily, linear regression can give us these values.
Let’s try fitting a model using our engineered features from the previous step. Because the input data are so nicely sinusoidal, this model actually works quite well. It has a Mean Absolute Error (MAE) of 1.997 and a Root Mean Squared Error (RMSE) of 2.574. Let’s see if we can do better.
ML.NET’s SSAForecastingTransformer can fit a forecasting model on our original data, without our having to provide it with engineered features. Most of the required parameters are based on the amount of data you have and the amount of time in the future you expect to predict. The only tricky one is the “windowSize” parameter, which should be set to be twice the length of the maximum expected seasonality in the data. For example, if you have data that is collected once per day in an environment that shows both monthly and yearly seasonality, you should set windowSize to be twice the length of the year, or 730. See the example notebook for more details on the other parameters.
Notice that the SSA Forecasting Transformer gives us not only a lower MAE and RMSE of 1.963 and 2.491, respectively, but also gives us 95% confidence bounds.
So we’ve found our model of interest, now let’s use it to predict the future! We can simply retrain the model on all of the data, and then use CreateTimeSeriesEngine to get a predictor, and then call Predict() to predict points up to the horizon we specified during training.
In this notebook, you learned how to do time series forecasting in ML.NET with Jupyter notebooks. We initially used linear regression with an engineered feature, but we were able to improve performance by relying on ML.NET's SSA forecaster.
To learn more about C# and Jupyter Notebooks, check out this GitHub repo.
To see another example of using ML.NET in Jupyter, check out this blog.
To learn about using DataFrames in C#, check out this blog.
To get started with Model Builder in Visual Studio, try this getting started tutorial.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.