Time-series and Deep Learning models: A Capstone Project with Massachusetts Institute of Technology

Microsoft

Jan 21, 2020

In the past few months, we have been collaborating on a machine learning and deep learning Capstone Project with Massachusetts Institute of Technology (MIT)’s Master of Business Analytics: as we discussed in our previous blog post Machine Learning for Sales Forecasting: A Capstone Project with Columbia University, capstone projects are applied and experimental projects where students take what they have learned throughout the course of their graduate program and apply it to examine a specific area of study.

MIT's Master of Business Analytics is a 12-month program focused on applying the tools of modern data science, optimization, and machine learning to solve real-world business problems. Students built an end-to-end time series solution using Azure Machine Learning, which is a cloud-based environment that you can use to train, deploy, automate, manage, and track ML models.

Azure Machine Learning can be used for any kind of machine learning, from classical machine learning to deep learning, supervised, and unsupervised learning. Whether you prefer to write Python or R code or zero-code/low-code options such as the designer, you can build, train, and track highly accurate machine learning and deep-learning models in an Azure Machine Learning Workspace:

In this article, we use an approach also used by MIT students and we will show you how configure and auto-train a time-series forecasting model. Configuring a forecasting model is similar to setting up a standard regression model using automated machine learning (automated ML or AutoML), but certain configuration options and pre-processing steps exist for working with time-series data. The following examples show you how to:

Prepare data for time series modeling
Configure specific time-series parameters in an AutoMLConfig object

You can find the full tutorial on how to train a time series forecasting model here. You can use automated ML to combine techniques and approaches and get a recommended, high-quality time-series forecast. An automated time-series experiment is treated as a multivariate regression problem. Past time-series values are “pivoted” to become additional dimensions for the regressor together with other predictors.

You can configure how far into the future the forecast should extend (the forecast horizon), as well as lags and more. Automated ML learns a single, but often internally branched model for all items in the dataset and prediction horizons. More data is thus available to estimate model parameters and generalization to unseen series becomes possible.

Features extracted from the training data play a critical role. And, automated ML performs standard pre-processing steps and generates additional time-series features to capture seasonal effects and maximize predictive accuracy.

Time-series and Deep Learning models

Automated ML provides users with both native time-series and deep learning models as part of the recommendation system. These learners include:

Prophet
Auto-ARIMA
ForecastTCN

Automated ML's deep learning allows for forecasting univariate and multivariate time series data.

Deep learning models have three intrinsic capabilities:

They can learn from arbitrary mappings from inputs to outputs
They support multiple inputs and outputs
They can automatically extract patterns in input data that spans over long sequences

Given larger data, deep learning models, such as Microsoft's ForecastTCN, can improve the scores of the resulting model.

Native time series learners are also provided as part of automated ML. Prophet works best with time series that have strong seasonal effects and several seasons of historical data. Prophet is accurate & fast, robust to outliers, missing data, and dramatic changes in your time series.

AutoRegressive Integrated Moving Average (ARIMA) is a popular statistical method for time series forecasting. This technique of forecasting is commonly used in short term forecasting scenarios where data shows evidence of trends such as cycles, which can be unpredictable and difficult to model or forecast. Auto-ARIMA transforms your data into stationary data to receive consistent, reliable results.

Preparing data

The most important difference between a forecasting regression task type and regression task type within automated machine learning is including a feature in your data that represents a valid time series. A regular time series has a well-defined and consistent frequency and has a value at every sample point in a continuous time span. Consider the following snapshot of a file sample.csv.

day_datetime,store,sales_quantity,week_of_year
9/3/2018,A,2000,36
9/3/2018,B,600,36
9/4/2018,A,2300,36
9/4/2018,B,550,36
9/5/2018,A,2100,36
9/5/2018,B,650,36
9/6/2018,A,2400,36
9/6/2018,B,700,36
9/7/2018,A,2450,36
9/7/2018,B,650,36

This data set is a simple example of daily sales data for a company that has two different stores, A and B. Additionally, there is a feature for week_of_year that will allow the model to detect weekly seasonality. The field day_datetime represents a clean time series with daily frequency, and the field sales_quantity is the target column for running predictions. Read the data into a Pandas dataframe, then use the to_datetime function to ensure the time series is a datetime type.

import pandas as pd
data = pd.read_csv("sample.csv")
data["day_datetime"] = pd.to_datetime(data["day_datetime"])

In this case, the data is already sorted ascending by the time field day_datetime. However, when setting up an experiment, ensure the desired time column is sorted in ascending order to build a valid time series. Assume the data contains 1,000 records, and make a deterministic split in the data to create training and test data sets. Identify the label column name and set it to label. In this example, the label will be sales_quantity. Then separate the label field from test_data to form the test_target set.

train_data = data.iloc[:950]
test_data = data.iloc[-50:]

label =  "sales_quantity"
 
test_labels = test_data.pop(label).values

When training a model for forecasting future values, ensure all the features used in training can be used when running predictions for your intended horizon. For example, when creating a demand forecast, including a feature for current stock price could massively increase training accuracy. However, if you intend to forecast with a long horizon, you may not be able to accurately predict future stock values corresponding to future time-series points, and model accuracy could suffer.

Configure and run experiment

For forecasting tasks, automated machine learning uses pre-processing and estimation steps that are specific to time-series data. The following pre-processing steps will be executed:

Detect time-series sample frequency (for example, hourly, daily, weekly) and create new records for absent time points to make the series continuous.
Impute missing values in the target (via forward-fill) and feature columns (using median column values)
Create grain-based features to enable fixed effects across different series
Create time-based features to assist in learning seasonal patterns
Encode categorical variables to numeric quantities

The AutoMLConfig object defines the settings and data necessary for an automated machine learning task. Similar to a regression problem, you define standard training parameters like task type, number of iterations, training data, and number of cross-validations. For forecasting tasks, there are additional parameters that must be set that affect the experiment. The following table explains each parameter and its usage.

Param	Description
`time_column_name`	Used to specify the datetime column in the input data used for building the time series and inferring its frequency.
`grain_column_names`	Name(s) defining individual series groups in the input data. If grain is not defined, the data set is assumed to be one time-series.
`max_horizon`	Defines the maximum desired forecast horizon in units of time-series frequency. Units are based on the time interval of your training data, for example, monthly, weekly that the forecaster should predict out.
`target_lags`	Number of rows to lag the target values based on the frequency of the data. The lag is represented as a list or single integer. Lag should be used when the relationship between the independent variables and dependent variable doesn't match up or correlate by default. For example, when trying to forecast demand for a product, the demand in any month may depend on the price of specific commodities 3 months prior. In this example, you may want to lag the target (demand) negatively by 3 months so that the model is training on the correct relationship.
`target_rolling_window_size`	n historical periods to use to generate forecasted values, <= training set size. If omitted, n is the full training set size. Specify this parameter when you only want to consider a certain amount of history when training the model.
`enable_dnn`	Enable Forecasting DNNs.

See the reference documentation for more information. Create the time-series settings as a dictionary object. Set the time_column_name to the day_datetime field in the data set. Define the grain_column_names parameter to ensure that two separate time-series groups are created for the data; one for store A and B. Lastly, set the max_horizon to 50 in order to predict for the entire test set. Set a forecast window to 10 periods with target_rolling_window_size, and specify a single lag on the target values for two periods ahead with the target_lags parameter. It is recommended to set max_horizon, target_rolling_window_size and target_lags to "auto" which will automatically detect these values for you. In the example below, "auto" settings have been used for these paramaters.

time_series_settings = {
    "time_column_name": "day_datetime",
    "grain_column_names": ["store"],
    "max_horizon": "auto",
    "target_lags": "auto",
    "target_rolling_window_size": "auto",
    "preprocess": True,
}

By defining the grain_column_names in the code snippet above, AutoML will create two separate time-series groups, also known as multiple time-series. If no grain is defined, AutoML will assume that the dataset is a single time-series. Now create a standard AutoMLConfig object, specifying the forecasting task type, and submit the experiment. After the model finishes, retrieve the best run iteration:

from azureml.core.workspace import Workspace
from azureml.core.experiment import Experiment
from azureml.train.automl import AutoMLConfig
import logging

automl_config = AutoMLConfig(task='forecasting',
                             primary_metric='normalized_root_mean_squared_error',
                             experiment_timeout_minutes=15,
                             enable_early_stopping=True,
                             training_data=train_data,
                             label_column_name=label,
                             n_cross_validations=5,
                             enable_ensembling=False,
                             verbosity=logging.INFO,
                             **time_series_settings)

ws = Workspace.from_config()
experiment = Experiment(ws, "forecasting_example")
local_run = experiment.submit(automl_config, show_output=True)
best_run, fitted_model = local_run.get_output()

Final resources to learn more

To learn more, you can read the following articles and notebooks:

You can find the full tutorial on how to train a time series forecasting model here
Azure Machine Learning Notebooks: aka.ms/AzureMLServiceGithub
Azure Machine Learning Service: aka.ms/AzureMLservice
Get started with Azure ML: aka.ms/GetStartedAzureML
Automated Machine Learning Documentation: https://aka.ms/AutomatedMLDocs
What is Automated Machine Learning? https://aka.ms/AutomatedML
Model Interpretability with Azure ML Service: https://aka.ms/AzureMLModelInterpretability
Azure Notebooks: https://aka.ms/AzureNB
Python Microsoft: https://aka.ms/PythonMS
Azure ML for VS Code: aka.ms/AzureMLforVSCode