A team of machine learning researchers from Oxford University have showcased their AI research at the AI Summit on the Microsoft stand. They brought their deep reinforcement learning and use of Microsoft Azure to a fun and engaging demonstration which bring human players and AI together to collaborate in fighting against the Starcraft bots.
This work is being carried out by the Whiteson research lab (
/) in collaboration with PhD students from the Engineering department in Oxford. They are using StarCraft as a platform for developing and testing novel methods in deep multi-agent reinforcement learning. The effort is based on the
, an open source platform which allows Torch code to interact with the StarCraft game, Brood war. The Oxford team plans to release their codebase to the public after the publication of their next paper , which has been submitted to
. Unlike work undertaken by other research institutes using StarCraft, their effort is focussed on decentralised execution, meaning that each unit has to take independent decisions during the game based on local and incomplete observations. The project was initially developed on onsite servers. However, shifting to Azure has been extremely simple for them and has allowed Oxford to massively expand the number of experiments undertaken and the scope of the research as a whole.
Stabilising Experience Replay for Deep Multi-Agent Reinforcement Learning
"Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A key stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep RL relies. This paper proposes two methods that address this problem: 1) conditioning each agent's value function on a footprint that disambiguates the age of the data sampled from the replay memory and 2) using a multi-agent variant of importance sampling to naturally decay obsolete data. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL."