Deterministic policy gradient algorithms

Silver, David; Lever, Guy; Heess, Nicolas; Degris, Thomas; Wierstra, Daan; Riedmiller, Martin

preprintHAL (Le Centre pour la Communication Scientifique Directe)Jan 1, 2014GREEN OA

Deterministic policy gradient algorithms

DSDavid Silver GLGuy Lever NHNicolas Heess TDThomas Degris DWDaan Wierstra

Abstract

2014 In this paper we consider deterministic policy gradient algorithms for reinforcement learning with continuous actions. The deterministic policy gradient has a particularly appealing form: it is the expected gradient of the action-value function. This simple form means that the deterministic policy gradient can be estimated much more efficiently than the usual stochastic policy gradient. To ensure adequate exploration, we introduce an off-policy actor-critic algorithm that learns a deterministic target policy from an exploratory behaviour policy. We demonstrate that deterministic policy gradient algorithms can significantly outperform their stochastic counterparts in high-dimensional action spaces.

Citation impact

1,741

total citations

FWCI: —
Percentile: —
References: 20

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Algorithm

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.