BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

Zaken, Elad Ben; Goldberg, Yoav; Ravfogel, Shauli

doi:10.18653/v1/2022.acl-short.1

preprintJan 1, 2022GOLD OA

BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models

EBElad Ben Zaken YGYoav Goldberg SRShauli Ravfogel

Laboratoire d'Informatique de Paris-Nord · Bar-Ilan University · +1 more institution

Indexed incrossref

Abstract

We introduce BitFit, a sparse-finetuning method where only the bias-terms of the model (or a subset of them) are being modified. We show that with small-to-medium training data, applying BitFit on pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model. For larger data, the method is competitive with other sparse fine-tuning methods. Besides their practical utility, these findings are relevant for the question of understanding the commonly-used process of finetuning: they support the hypothesis that finetuning is mainly about exposing knowledge induced by language-modeling training, rather than learning new task-specific linguistic knowledge.

Citation impact

664

total citations

FWCI: 84.06
Percentile: 100%
References: 45

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Transformer
Language model
Artificial intelligence
Simple (philosophy)
Process (computing)
Training set
Machine learning

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

EC
European Commission
Award: 802774