DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion
Sorbonne Université · Valeo (France)
Abstract
Deep network architectures struggle to continually learn new tasks without forgetting the previous tasks. A recent trend indicates that dynamic architectures based on an ex-pansion of the parameters can reduce catastrophic forget-ting efficiently in continual learning. However, existing approaches often require a task identifier at test-time, need complex tuning to balance the growing number of parameters, and barely share any information across tasks. As a result, they struggle to scale to a large number of tasks without significant overhead. In this paper, we propose a transformer architecture based on a dedicated encoder/decoder framework. Critically, the encoder and decoder are shared among all tasks.…
Citation impact
- FWCI
- 31.37
- Percentile
- 100%
- References
- 125
Authors
4Topics & keywords
- Computer science
- Forgetting
- Encoder
- Transformer
- Identifier
- Distributed computing
- Artificial intelligence
- Computer network