Chenhui Hu, Vanja Paunic, Hong Ooi, Tao Wu, Wee Hyong Tok
Time series forecasting is one of the most important topics in data science. Imagine that you are a business owner, you might want to predict different sorts of future events to make better decisions and optimize your resource allocation. Typical examples of time series forecasting use cases are retail sales forecasting, package shipment delay forecasting, energy demand forecasting, and financial forecasting. As you can see, forecasting is everywhere! Given its ubiquitous nature and wide-ranging business applications, we have developed an open-source forecasting repo that puts world-class models and forecasting best practices in the hands of data scientists and industry experts – i.e., you!
Figure 1: Visualization of training and testing iterations of a sales forecasting scenario using LightGBM model
Forecasting Best Practices and Solution Accelerators
This repository provides examples of building forecasting solutions presented as Python Jupyter notebooks, R markdown files, and a library of utility functions. Our goal is to help you as a data scientist or machine learning engineer with varying levels of knowledge in forecasting:
Learn best practices for the development of forecasting solutions in a variety of languages.
Leverage recent advances in forecasting algorithms to build high-performance solutions and operationalize them.
Accelerate the solution development process for real-world forecasting problems. With the provided examples, you will be able to significantly reduce the “time to market” by simplifying the experience from defining the business problem to the development of solutions by orders of magnitude.
In the repository, you will find state-of-the-art (SOAT) forecasting models using traditional machine learning and deep learning approaches. Implementations of SOTA models in this release are centered around retail sales forecasting and are written in Python and R, two of the most popular programming languages in the forecasting domain. To enable high-throughput forecasting scenarios, we have included notebooks for forecasting multiple time series with distributed training techniques such as Ray in Python, the parallel package in R, and multi-threading in LightGBM. The following is a quick summary of forecasting models covered in this repository.
Automated forecasting procedure based on an additive model with non-linear trends and Tidyverts framework
The repository also comes with Azure Machine Learning (Azure ML) themed notebooks and best practices recipes to accelerate the development of scalable, production-grade forecasting solutions on Azure. You will find the following examples for forecasting with Azure AutoML as well as tuning and deploying a forecasting model on Azure.