Distributional Soft Actor-Critic With Three Refinements
University of Science and Technology Beijing · Tsinghua University · +1 more institution
Abstract
Reinforcement learning (RL) has shown remarkable success in solving complex decision-making and control tasks. However, many model-free RL algorithms experience performance degradation due to inaccurate value estimation, particularly the overestimation of Q-values, which can lead to suboptimal policies. To address this issue, we previously proposed the Distributional Soft Actor-Critic (DSAC or DSACv1), an off-policy RL algorithm that enhances value estimation accuracy by learning a continuous Gaussian value distribution. Despite its effectiveness, DSACv1 faces challenges such as training instability and sensitivity to reward scaling, caused by high variance in critic gradients due to return randomness. In this…
Citation impact
- FWCI
- 60.70
- Percentile
- 100%
- References
- 38
Authors
9Topics & keywords
- Computer science
- Artificial intelligence
- Computer vision