articleJun 1, 2020Closed access

Revisiting Knowledge Distillation via Label Smoothing Regularization

National University of Singapore · Huawei Technologies (Sweden)

Indexed incrossref

Abstract

Knowledge Distillation (KD) aims to distill the knowledge of a cumbersome teacher model into a lightweight student model. Its success is generally attributed to the privileged information on similarities among categories provided by the teacher model, and in this sense, only strong teacher models are deployed to teach weaker students in practice. In this work, we challenge this common belief by following experimental observations: 1) beyond the acknowledgment that the teacher can improve the student, the student can also enhance the teacher significantly by reversing the KD procedure; 2) a poorly-trained teacher with much lower accuracy than the student can still improve the latter significantly. To explain…

Citation impact

520
total citations
FWCI
43.38
Percentile
100%
References
27
Citations per year

Authors

5

Topics & keywords

Keywords
  • Regularization (linguistics)
  • Smoothing
  • Computer science
  • Distillation
  • Artificial intelligence
  • Machine learning
  • Artificial neural network
  • Deep neural networks
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.