DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion

Sorbonne Université · Valeo (France)

Indexed incrossref

Abstract

Deep network architectures struggle to continually learn new tasks without forgetting the previous tasks. A recent trend indicates that dynamic architectures based on an ex-pansion of the parameters can reduce catastrophic forget-ting efficiently in continual learning. However, existing approaches often require a task identifier at test-time, need complex tuning to balance the growing number of parameters, and barely share any information across tasks. As a result, they struggle to scale to a large number of tasks without significant overhead. In this paper, we propose a transformer architecture based on a dedicated encoder/decoder framework. Critically, the encoder and decoder are shared among all tasks.…

Citation impact

320
total citations
FWCI
31.37
Percentile
100%
References
125
Citations per year

Authors

4

Topics & keywords

Keywords
  • Computer science
  • Forgetting
  • Encoder
  • Transformer
  • Identifier
  • Distributed computing
  • Artificial intelligence
  • Computer network
No related works found for this paper.