Distributional Soft Actor-Critic With Three Refinements

University of Science and Technology Beijing · Tsinghua University · +1 more institution

PubMed
Indexed incrossrefpubmed

Abstract

Reinforcement learning (RL) has shown remarkable success in solving complex decision-making and control tasks. However, many model-free RL algorithms experience performance degradation due to inaccurate value estimation, particularly the overestimation of Q-values, which can lead to suboptimal policies. To address this issue, we previously proposed the Distributional Soft Actor-Critic (DSAC or DSACv1), an off-policy RL algorithm that enhances value estimation accuracy by learning a continuous Gaussian value distribution. Despite its effectiveness, DSACv1 faces challenges such as training instability and sensitivity to reward scaling, caused by high variance in critic gradients due to return randomness. In this…

Citation impact

53
total citations
FWCI
60.70
Percentile
100%
References
38
Citations per year

Authors

9

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Computer vision
No related works found for this paper.

Funding