Curriculum Temperature for Knowledge Distillation
Nankai University · Nanjing University of Science and Technology · +2 more institutions
Abstract
Most existing distillation methods ignore the flexible role of the temperature in the loss function and fix it as a hyper-parameter that can be decided by an inefficient grid search. In general, the temperature controls the discrepancy between two distributions and can faithfully determine the difficulty level of the distillation task. Keeping a constant temperature, i.e., a fixed level of task difficulty, is usually sub-optimal for a growing student during its progressive learning stages. In this paper, we propose a simple curriculum-based technique, termed Curriculum Temperature for Knowledge Distillation (CTKD), which controls the task difficulty level during the student's learning career through a dynamic…
Citation impact
- FWCI
- 18.34
- Percentile
- 100%
- References
- 75
Authors
8Topics & keywords
- Distillation
- Curriculum
- Computer science
- Task (project management)
- Function (biology)
- Computation
- GLUE
- Simple (philosophy)
- Quality Education