High-Dimensional Continuous Control Using Generalized Advantage Estimation

Schulman, John; Moritz, Philipp; Levine, Sergey; Jordan, Michael I.; Abbeel, Pieter

doi:10.48550/arxiv.1506.02438

preprintarXiv (Cornell University)Jun 8, 2015GREEN OA

High-Dimensional Continuous Control Using Generalized Advantage Estimation

JSJohn Schulman PMPhilipp Moritz SLSergey Levine MIMichael I. Jordan PAPieter Abbeel

University of California, Berkeley

Indexed inarxivdatacite

Abstract

Policy gradient methods are an appealing approach in reinforcement learning because they directly optimize the cumulative reward and can straightforwardly be used with nonlinear function approximators such as neural networks. The two main challenges are the large number of samples typically required, and the difficulty of obtaining stable and steady improvement despite the nonstationarity of the incoming data. We address the first challenge by using value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function that is analogous to TD(lambda). We address the second challenge by using trust region…

Citation impact

1,748

total citations

FWCI: —
Percentile: —
References: 23

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Estimator
Artificial neural network
Reinforcement learning
Computer science
Variance (accounting)
Function (biology)
Bellman equation
Kinematics

No related works found for this paper.