Revisiting Knowledge Distillation via Label Smoothing Regularization

Yuan, Li; Tay, Francis EH; Li, Guilin; Wang, Tao; Feng, Jiashi

doi:10.1109/cvpr42600.2020.00396

articleJun 1, 2020Closed access

Revisiting Knowledge Distillation via Label Smoothing Regularization

LYLi YuanFEFrancis EH Tay GLGuilin Li TWTao Wang JFJiashi Feng

National University of Singapore · Huawei Technologies (Sweden)

Indexed incrossref

Abstract

Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models are deployed to teach weaker students in practice. In this work, we challenge this common belief by following experimental observations: 1) beyond the acknowledgment that the teacher can improve the student, the student can also enhance the teacher significantly by reversing the KD procedure; 2) a poorly-trained teacher with much lower accuracy than the student can still improve the latter significantly. To explain…

Citation impact

520

total citations

FWCI: 43.38
Percentile: 100%
References: 27

Citations per year

Authors

5

LY
Li YuanCorresponding
National University of Singapore
FE
Francis EH Tay
National University of Singapore
GL
Guilin Li
Huawei Technologies (Sweden)
TW
Tao Wang
National University of Singapore
JF
Jiashi Feng
National University of Singapore

Topics & keywords

Topics

Keywords

Regularization (linguistics)
Smoothing
Computer science
Distillation
Artificial intelligence
Machine learning
Artificial neural network
Deep neural networks

UN Sustainable Development Goals

Quality Education

No related works found for this paper.