Towards Understanding Convergence and Generalization of AdamW

Zhou, Pan; Xie, Xingyu; Lin, Zhouchen; Yan, Shuicheng

doi:10.1109/tpami.2024.3382294

articleIEEE Transactions on Pattern Analysis and Machine IntelligenceMar 27, 2024GREEN OA

Towards Understanding Convergence and Generalization of AdamW

PZPan Zhou XXXingyu Xie ZLZhouchen Lin SYShuicheng Yan

Singapore Management University · Peking University · +2 more institutions

PubMed

Indexed incrossrefpubmed

Abstract

AdamW modifies Adam by adding a decoupled weight decay to decay network weights per training iteration. For adaptive algorithms, this decoupled weight decay does not affect specific optimization steps, and differs from the widely used $\ell _{2}$ -regularizer which changes optimization steps via changing the first- and second-order gradient moments. Despite its great practical success, for AdamW, its convergence behavior and generalization improvement over Adam and $\ell _{2}$ -regularized Adam ( $\ell _{2}$ -Adam) remain absent yet. To solve this issue, we prove the convergence of AdamW and justify its generalization advantages over Adam and $\ell _{2}$ -Adam. Specifically, AdamW provably converges but…

Citation impact

211

total citations

FWCI: 66.45
Percentile: 100%
References: 68

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Generalization
Artificial intelligence
Convergence (economics)
Pattern recognition (psychology)
Mathematics

UN Sustainable Development Goals

Reduced inequalities

No related works found for this paper.

Funding

MO
Ministry of Education - Singapore