ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models
Indexed incrossrefdoaj
Abstract
Abstract Most widely used pre-trained language models operate on sequences of tokens corresponding to word or subword units. By comparison, token-free models that operate directly on raw text (bytes or characters) have many benefits: They can process text in any language out of the box, they are more robust to noise, and they minimize technical debt by removing complex and error-prone text preprocessing pipelines. Because byte or character sequences are longer than token sequences, past work on token-free models has often introduced new model architectures designed to amortize the cost of operating directly on raw text. In this paper, we show that a standard Transformer architecture can be used with minimal…
Citation impact
236
total citations
- FWCI
- 27.75
- Percentile
- 100%
- References
- 110
Citations per year
Authors
8Topics & keywords
Topics
Keywords
- Computer science
- Byte
- Security token
- Language model
- Inference
- Transformer
- Artificial intelligence
- Programming language
No related works found for this paper.