Conservative Q-Learning for Offline Reinforcement Learning

Kumar, Aviral; Zhou, Aurick; Tucker, George; Levine, Sergey

doi:10.48550/arxiv.2006.04779

preprintarXiv (Cornell University)Jun 8, 2020GREEN OA

Conservative Q-Learning for Offline Reinforcement Learning

AKAviral Kumar AZAurick Zhou GTGeorge Tucker SLSergey Levine

University of California, Berkeley

Indexed inarxivdatacite

Abstract

Effectively leveraging large, previously collected datasets in reinforcement learning (RL) is a key challenge for large-scale real-world applications. Offline RL algorithms promise to learn effective policies from previously-collected, static datasets without further interaction. However, in practice, offline RL presents a major challenge, and standard off-policy RL methods can fail due to overestimation of values induced by the distributional shift between the dataset and the learned policy, especially when training on complex and multi-modal data distributions. In this paper, we propose conservative Q-learning (CQL), which aims to address these limitations by learning a conservative Q-function such that the…

Citation impact

537

total citations

FWCI: —
Percentile: —
References: 60

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Reinforcement learning
Computer science
Function (biology)
Modal
Artificial intelligence
Machine learning
Key (lock)
Value (mathematics)

No related works found for this paper.