FitNets: Hints for Thin Deep Nets

Romero, Adriana; Ballas, Nicolas; Kahou, Samira Ebrahimi; Chassang, Antoine; Gatta, Carlo; Bengio, Yoshua

doi:10.48550/arxiv.1412.6550

preprintarXiv (Cornell University)Dec 19, 2014GREEN OA

FitNets: Hints for Thin Deep Nets

ARAdriana Romero NBNicolas Ballas SESamira Ebrahimi Kahou ACAntoine Chassang CGCarlo Gatta

Universitat de Barcelona · École Nationale Supérieure d'Architecture de Lyon · +3 more institutions

Indexed inarxivdatacite

Abstract

While depth tends to improve network performances, it also makes gradient-based training more difficult since deeper networks tend to be more non-linear. The recently proposed knowledge distillation approach is aimed at obtaining small and fast-to-execute models, and it has shown that a student network could imitate the soft output of a larger teacher network or ensemble of networks. In this paper, we extend this idea to allow the training of a student that is deeper and thinner than the teacher, using not only the outputs but also the intermediate representations learned by the teacher as hints to improve the training process and final performance of the student. Because the student intermediate hidden layer…

Citation impact

2,033

total citations

FWCI: —
Percentile: —
References: 27

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Layer (electronics)
Process (computing)
State (computer science)
Training (meteorology)
Artificial intelligence
Mathematics education
Algorithm

UN Sustainable Development Goals

Quality Education

No related works found for this paper.