Introducing Reinforcement Learning on Azure Machine Learning

Microsoft

May 19, 2020

We are excited to announce the preview of Reinforcement Learning on Azure Machine Learning. Reinforcement learning is an approach to machine learning to train agents to make a sequence of decisions. This technique has gained popularity over the last few years as breakthroughs have been made to teach reinforcement learning agents to excel at complex tasks like playing video games. There are many practical real-world use cases as well, including robotics, chemistry, online recommendations, advertising and more.

What is reinforcement learning?

In reinforcement learning, the goal is to train an agent policy that outputs actions based on the agent’s observations of its environment. Actions result in further observations and rewards for taking the actions. In reinforcement learning, the full reward for policy actions may take many steps to obtain. Learning a policy involves many trial-and-error runs of the agent interacting with the environment and improving its policy.

The new reinforcement learning support in Azure Machine Learning service enables data scientists to scale training to many powerful CPU or GPU enabled VMs using Azure Machine Learning compute clusters which automatically provision, manage, and scale down these VMs to help manage your costs.

Learning reinforcement learning with Minecraft

We use reinforcement learning in many ways at Microsoft to improve our products and services. For example, Office uses reinforcement learning to improve the suggestions it makes to users in its apps. To help you get started with reinforcement learning you should check out sample notebooks to train an agent to navigate a lava maze in Minecraft using Azure Machine Learning.

The agent’s goal is to navigate a maze and get to the blue exit tile by walking along solid tiles. If the agent wanders off the solid tiles, it falls into lava and must start over again. Each maze map is randomly generated so the agent must learn to generalize to handle different conditions.

The first step for a data scientist is to develop the training script. The training script for the Minecraft sample is on Github. A typical experience involves iterative development using a combination of local or cloud hosted notebooks, and development tools such as Visual Studio Code or PyCharm. Azure Machine Learning Compute Instance is a cloud hosted Jupyter notebook server that enables rapid iteration using cloud resources.

Once a data scientist creates a Python training script, expanding training to multiple nodes is simple. After creating compute clusters in Azure Machine Learning Studio UI or by using Python SDK calls, the data scientist submits an agent training job using the Azure Machine Learning ReinforcementLearningEstimator. The following example sets up a training configuration to run Minecraft on 8 worker compute nodes to collect training data.

worker_config = WorkerConfiguration(
    compute_target=cpu_cluster, 
    node_count=8,
    environment=cpu_minecraft_environment)

estimator = ReinforcementLearningEstimator(
    source_directory='files',
    entry_script='minecraft_train.py',
    compute_target=gpu_cluster,
    environment=gpu_minecraft_environment,
    worker_configuration=worker_config,
    max_run_duration_seconds=2 * 60 * 60,
    shm_size=1024 * 1024 * 1024 * 30)

run = experiment.submit(estimator)

Azure Machine Learning automatically allocates compute nodes in the compute target, loads them with container images containing Minecraft and simulation code, and starts running the training script. After training completes, compute nodes automatically deallocate based on user policy to avoid incurring extra charges.

Data scientists track progress of training with multiple methods, including Tensorboard, within the Jupyter notebook, and on Azure Machine Learning Studio. Here we show how the training reward increases over time in Azure Machine Learning Studio.

After training is completed, the agent can be evaluated to see how well it performs. In the animation below, the agent is seen successfully navigating the maze! Training this agent takes around 90 minutes using the configuration in the Minecraft code sample.

Training Agents on Azure Machine Learning

Azure Machine Learning customers are applying Reinforcement Learning on Azure Machine Learning to industrial and other applications. We are seeing Azure Machine Learning customers train reinforcement learning agents on up to 512 cores or running their training over multiple days. In practice, it can take millions of trial runs to train an agent. These trial runs happen automatically, rapidly in parallel and the system continuously learns and improves. Azure Machine Learning uses the Ray framework to distribute reinforcement learning training to support large scale training

To train agents on Azure Machine Learning, data scientists use standard machine learning tools including the Azure Machine Learning Python SDK, the Azure Machine Learning Studio UI to monitor and manage progress, and the command line interface. Azure Machine Learning simplifies running reinforcement learning on remote compute clusters, including tracking experiment results in Tensorboard and Azure Machine Learning Studio.