# Continuous control with deep reinforcement learning

@article{Lillicrap2016ContinuousCW, title={Continuous control with deep reinforcement learning}, author={Timothy P. Lillicrap and Jonathan J. Hunt and Alexander Pritzel and Nicolas Manfred Otto Heess and Tom Erez and Yuval Tassa and David Silver and Daan Wierstra}, journal={CoRR}, year={2016}, volume={abs/1509.02971} }

We adapt the ideas underlying the success of Deep Q-Learning to the continuous action domain. [...] Key Method Our algorithm is able to find policies whose performance is competitive with those found by a planning algorithm with full access to the dynamics of the domain and its derivatives. We further demonstrate that for many of the tasks the algorithm can learn policies end-to-end: directly from raw pixel inputs. Expand

#### Supplemental Content

Github Repo

Via Papers with Code

Pytorch implementation of the Deep Deterministic Policy Gradients for Continuous Control

#### Paper Mentions

#### 5,514 Citations

Continuous Deep Q-Learning with Model-based Acceleration

- Computer Science
- ICML
- 2016

This paper derives a continuous variant of the Q-learning algorithm, which it is called normalized advantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods, and substantially improves performance on a set of simulated robotic control tasks. Expand

The Beta Policy for Continuous Control Reinforcement Learning

- 2017

Recently, reinforcement learning with deep neural networks has achieved great success in challenging continuous control problems such as 3D locomotion and robotic manipulation. However, in real-world… Expand

DEEP REINFORCEMENT LEARNING IN PARAMETER- IZED ACTION SPACE

- Computer Science
- 2016

This paper represents a successful extension of deep reinforcement learning to the class of parameterized action space MDPs, which features a small set of discrete action types, each of which is parameterized with continuous variables. Expand

Particle-Based Adaptive Discretization for Continuous Control using Deep Reinforcement Learning

- Computer Science
- ArXiv
- 2020

This paper proposes a general, yet simple, framework for improving the action exploration of policy gradient DRL algorithms that adapts ideas from the particle filtering literature to dynamically discretize the continuous action space and track policies represented as a mixture of Gaussians. Expand

Improving Stochastic Policy Gradients in Continuous Control with Deep Reinforcement Learning using the Beta Distribution

- Computer Science
- ICML
- 2017

It is shown that the Beta policy is bias-free and provides significantly faster convergence and higher scores over the Gaussian policy when both are used with trust region policy optimization and actor critic with experience replay, the state-of-the-art on- and off-policy stochastic methods respectively, on OpenAI Gym's and MuJoCo's continuous control environments. Expand

Continuous Control in Deep Reinforcement Learning with Direct Policy Derivation from Q Network

- Computer Science
- IHIET
- 2020

This work implements efficient action derivation method which allows using Q-learning in real-time continuous control tasks and shows that in some cases proposed approach learns smooth continuous policy keeping the implementation simplicity of the original discreet action space Q- learning algorithm. Expand

Using Deep Reinforcement Learning for the Continuous Control of Robotic Arms

- Computer Science
- ArXiv
- 2018

A newly created combination of two commonly used reinforcement learning methods is tested to see whether it is able to learn more effectively than a baseline and to reduce training time and eventually help the algorithm to converge. Expand

Deep Reinforcement Learning in Parameterized Action Space

- Computer Science
- ICLR
- 2016

This paper represents a successful extension of deep reinforcement learning to the class of parameterized action space MDPs within the domain of simulated RoboCup soccer, which features a small set of discrete action types each of which is parameterized with continuous variables. Expand

Deep Reinforcement Learning for Simulated Autonomous Vehicle Control

- 2016

We investigate the use of Deep Q-Learning to control a simulated car via reinforcement learning. We start by implementing the approach of [5] ourselves, and then experimenting with various possible… Expand

Multi-Pass Q-Networks for Deep Reinforcement Learning with Parameterised Action Spaces

- Computer Science, Mathematics
- ArXiv
- 2019

It is empirically demonstrated that MP-DQN significantly outperforms P-D QN and other previous algorithms in terms of data efficiency and converged policy performance on the Platform, Robot Soccer Goal, and Half Field Offense domains. Expand

#### References

SHOWING 1-10 OF 39 REFERENCES

From Pixels to Torques: Policy Learning with Deep Dynamical Models

- Computer Science, Mathematics
- ICML 2015
- 2015

This paper introduces a data-efficient, model-based reinforcement learning algorithm that learns a closed-loop control policy from pixel information only, and facilitates fully autonomous learning from pixels to torques. Expand

Autonomous reinforcement learning with experience replay.

- Computer Science, Medicine
- Neural networks : the official journal of the International Neural Network Society
- 2013

A real-time reinforcement learning algorithm is presented that repeatedly adjusts the control policy with the use of previously collected samples, and autonomously estimates the appropriate step-sizes for the learning updates. Expand

Playing Atari with Deep Reinforcement Learning

- Computer Science
- ArXiv
- 2013

This work presents the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning, which outperforms all previous approaches on six of the games and surpasses a human expert on three of them. Expand

End-to-End Training of Deep Visuomotor Policies

- Computer Science, Mathematics
- J. Mach. Learn. Res.
- 2016

This paper develops a method that can be used to learn policies that map raw image observations directly to torques at the robot's motors, trained using a partially observed guided policy search method, with supervision provided by a simple trajectory-centric reinforcement learning method. Expand

Memory-based control with recurrent neural networks

- Computer Science
- ArXiv
- 2015

This work extends two related, model-free algorithms for continuous control to solve partially observed domains using recurrent neural networks trained with backpropagation through time to find that recurrent deterministic and stochastic policies are able to learn similarly good solutions to these tasks, including the water maze where the agent must learn effective search strategies. Expand

Learning Continuous Control Policies by Stochastic Value Gradients

- Computer Science, Mathematics
- NIPS
- 2015

A unified framework for learning continuous control policies using backpropagation supported by treating stochasticity in the Bellman equation as a deterministic function of exogenous noise is presented. Expand

Compatible Value Gradients for Reinforcement Learning of Continuous Deep Policies

- Computer Science, Mathematics
- ArXiv
- 2015

GProp, a deep reinforcement learning algorithm for continuous policies with compatible function approximation based on a temporal-difference based method for learning the gradient of the value-function, is proposed and achieves the best performance to date on the octopus arm. Expand

Real-time reinforcement learning by sequential Actor-Critics and experience replay

- Computer Science, Medicine
- Neural Networks
- 2009

It is formally shown that the resulting estimation bias is bounded and asymptotically vanishes, which allows the experience replay-augmented algorithm to preserve the convergence properties of the original algorithm. Expand

Online Evolution of Deep Convolutional Network for Vision-Based Reinforcement Learning

- Computer Science
- SAB
- 2014

The Max-Pooling Convolutional Neural Network (MPCNN) compressor is evolved online, maximizing the distances between normalized feature vectors computed from the images collected by the recurrent neural network (RNN) controllers during their evaluation in the environment. Expand

Human-level control through deep reinforcement learning

- Computer Science, Medicine
- Nature
- 2015

This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks. Expand