Decoupled Knowledge Distillation

Zhao, Borui; Cui, Quan; Song, Renjie; Qiu, Yiyu; Liang, Jiajun

doi:10.1109/cvpr52688.2022.01165

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

Decoupled Knowledge Distillation

BZBorui Zhao QCQuan Cui RSRenjie Song YQYiyu Qiu JLJiajun Liang

Megvii (China) · Vi Technology (United States) · +2 more institutions

Indexed incrossref

Abstract

State-of-the-art distillation methods are mainly based on distilling deep features from intermediate layers, while the significance of logit distillation is greatly overlooked. To provide a novel viewpoint to study logit distillation, we re-formulate the classical KD loss into two parts, i.e., target class knowledge distillation (TCKD) and non-target class knowledge distillation (NCKD). We empirically investigate and prove the effects of the two parts: TCKD transfers knowledge concerning the “difficulty” of training samples, while NCKD is the prominent reason why logit distillation works. More importantly, we reveal that the classical KD loss is a coupled formulation, which (1) suppresses the effectiveness of…

Citation impact

848

total citations

FWCI: 46.32
Percentile: 100%
References: 56

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Distillation
Computer science
Flexibility (engineering)
Class (philosophy)
Artificial intelligence
Feature (linguistics)
Machine learning
Image (mathematics)

No related works found for this paper.