Revisiting Knowledge Distillation via Label Smoothing Regularization
National University of Singapore · Huawei Technologies (Sweden)
Abstract
Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models are deployed to teach weaker students in practice. In this work, we challenge this common belief by following experimental observations: 1) beyond the acknowledgment that the teacher can improve the student, the student can also enhance the teacher significantly by reversing the KD procedure; 2) a poorly-trained teacher with much lower accuracy than the student can still improve the latter significantly. To explain…
Citation impact
- FWCI
- 43.38
- Percentile
- 100%
- References
- 27
Authors
5- LYLi YuanCorresponding
National University of Singapore
- FEFrancis EH Tay
National University of Singapore
- GLGuilin Li
Huawei Technologies (Sweden)
- TWTao Wang
National University of Singapore
- JFJiashi Feng
National University of Singapore
Topics & keywords
- Regularization (linguistics)
- Smoothing
- Computer science
- Distillation
- Artificial intelligence
- Machine learning
- Artificial neural network
- Deep neural networks
- Quality Education