Curriculum Temperature for Knowledge Distillation

Nankai University · Nanjing University of Science and Technology · +2 more institutions

Indexed incrossref

Abstract

Most existing distillation methods ignore the flexible role of the temperature in the loss function and fix it as a hyper-parameter that can be decided by an inefficient grid search. In general, the temperature controls the discrepancy between two distributions and can faithfully determine the difficulty level of the distillation task. Keeping a constant temperature, i.e., a fixed level of task difficulty, is usually sub-optimal for a growing student during its progressive learning stages. In this paper, we propose a simple curriculum-based technique, termed Curriculum Temperature for Knowledge Distillation (CTKD), which controls the task difficulty level during the student's learning career through a dynamic…

Citation impact

188
total citations
FWCI
18.34
Percentile
100%
References
75
Citations per year

Authors

8

Topics & keywords

Keywords
  • Distillation
  • Curriculum
  • Computer science
  • Task (project management)
  • Function (biology)
  • Computation
  • GLUE
  • Simple (philosophy)
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding