Deep reinforcement learning (DRL) is an exciting field of artificial intelligence research, which has potential applicability on many problems. Some people believe that DRL is a path to general artificial intelligence (AGI), because it reflects human learning by exploring and receiving feedback from the environment. Recently, DRL has successfully defeated human video game players, and biped agents learn to walk in a simulated environment. These developments have increased people’s enthusiasm in this field.
Different from supervised learning based on known label training model, in reinforcement learning, researchers train the model by making agents interact with the environment. When an agent’s behavior produces the desired results, for example, the agent obtains a reward score or wins a game, it will get positive feedback. In short, researchers have strengthened the good behavior of agents.
One of the key challenges in applying DRL to practical problems is to construct an incentive function that encourages the desired behavior without side effects.
It may be worth noting that although deep reinforcement learning (“deep” refers to that the underlying model is a deep neural network) is still a relatively new field, reinforcement learning has appeared since the 1970s or earlier.
As Andrej karpath, one of the leading players in the field of computer vision, Li Feifei and Gao Tu, a former research scientist at openai and now Tesla AI director, pointed out in his blog in 2016, key DRL research such as alphago and Atari deep Q-learning are based on algorithms that have existed for some time, but deep learning replaces other methods of approximating functions. Of course, their ability to use deep learning is due to the explosive growth of cheap computing power over the past 20 years.
The confidence brought by DRL, coupled with Google’s acquisition of deepmind for $500 million in 2014, makes many startups want to take advantage of this technology. As people are more and more interested in DRL, we also see the New Open Source Toolkit and the training environment of DRL agent. Most of these frameworks are dedicated simulation tools or interfaces in essence. The following are several toolkits worthy of attention:
Openai gym is a popular toolkit for developing and comparing reinforcement learning models. Its simulator interface supports a variety of environments, including classic Atari games, as well as robot and physical simulators, such as muzoco and DARPA funded gazebo. Like other DRL toolkits, it provides APIs to feed back observations and rewards to agents.
Deepmind lab is a 3D learning environment based on quake III first person shooting game, which provides navigation and learning tasks for agent training. Deepmind recently added dmlab-30 agent training suite and introduced a new impala distributed agent training architecture.
Another deepmind toolkit, psychlab, was opened earlier this year. It extends deepmind lab to support cognitive psychology experiments, such as searching for a specific set of targets or detecting changes in a set of objects. Researchers can then compare the performance of humans and agents on these tasks.
With the cooperation of researchers from the University of California, Berkeley and Facebook artificial intelligence, house 3D provides more than 45000 simulated indoor scenes, including real room and furniture layout. The main task mentioned in the paper introducing house 3D is “concept driven navigation”, such as training an agent to navigate to a room in the house when only advanced descriptors such as “restaurant” are given.
Unity Machine Learning Agents
Under the guidance of Danny Lange (VP of AI and ml), game engine developer unity is trying to integrate advanced AI technology into its platform. Unity machine learning agents was released in September 2017. It is an open source unity plug-in, which can make the game and simulation environment running on the platform as the environment for training agents.
Other tools listed here mainly focus on the DRL training environment, while ray introduces the DRL infrastructure more. Ray was developed by ion stoica and his team in Berkeley riselab. It is a framework for efficiently running Python code on clusters and large multi-core machines. Its goal is to provide a low latency distributed execution framework for reinforcement learning.
The emergence of all these tools and platforms will make DRL more convenient for developers and researchers. However, they need all the help they can get, because deep reinforcement learning technology is difficult to put into practice. Google engineer Alex irpan recently published an article entitled “deep reinforcement learning is not mature” to explain the reasons. Irpan cited the large amount of data required by DRL, the fact that most DRL methods do not make use of a priori knowledge about the systems and environments involved, and the difficulties in proposing effective excitation functions mentioned earlier.
From the perspective of research and application, deep reinforcement learning can continue to become a hot topic in the field of artificial intelligence. It shows great potential in dealing with complex, multifaceted and decision-making problems, which makes it useful not only for industrial systems and games, but also in the fields of marketing, advertising, finance, education, and even data science itself.