A Comprehensive Overhaul of Feature Distillation
Naver (South Korea) · Seoul National University
Abstract
We investigate the design aspects of feature distillation methods achieving network compression and propose a novel feature distillation method in which the distillation loss is designed to make a synergy among various aspects: teacher transform, student transform, distillation feature position and distance function. Our proposed distillation loss includes a feature transform with a newly designed margin ReLU, a new distillation feature position, and a partial L 2 distance function to skip redundant information giving adverse effects to the compression of student. In ImageNet, our proposed method achieves 21.65% of top-1 error with ResNet50, which outperforms the performance of the teacher network, ResNet152.…
Citation impact
- FWCI
- 21.84
- Percentile
- 100%
- References
- 56
Authors
6Topics & keywords
- Distillation
- Computer science
- Feature (linguistics)
- Margin (machine learning)
- Feature extraction
- Artificial intelligence
- Position (finance)
- Code (set theory)
- Quality Education