Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

Mel, Vecerik,; Hester, Todd; Scholz, Jonathan; Wang, Fumin; Pietquin, Olivier; Piot, Bilal; Heess, Nicolas; Rothörl, Thomas; Lampe, Thomas; Riedmiller, Martin

doi:10.48550/arxiv.1707.08817

preprintarXiv (Cornell University)Jul 27, 2017GREEN OA

Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards

VMVecerik, MelTHTodd Hester JSJonathan Scholz FWFumin Wang OPOlivier Pietquin

Indexed inarxivdatacite

Abstract

We propose a general and model-free approach for Reinforcement Learning (RL) on real robotics with sparse rewards. We build upon the Deep Deterministic Policy Gradient (DDPG) algorithm to use demonstrations. Both demonstrations and actual interactions are used to fill a replay buffer and the sampling ratio between demonstrations and transitions is automatically tuned via a prioritized replay mechanism. Typically, carefully engineered shaping rewards are required to enable the agents to efficiently explore on high dimensional control problems such as robotics. They are also required for model-based acceleration methods relying on local solvers such as iLQG (e.g. Guided Policy Search and Normalized Advantage…

Citation impact

510

total citations

FWCI: —
Percentile: —
References: 20

Citations per year

Authors

10

Topics & keywords

Topics

Keywords

Robotics
Reinforcement learning
Artificial intelligence
Computer science
Robot
Task (project management)
Object (grammar)
Function (biology)

No related works found for this paper.