ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models

Google (United States)

Indexed incrossrefdoaj

Abstract

Abstract Most widely used pre-trained language models operate on sequences of tokens corresponding to word or subword units. By comparison, token-free models that operate directly on raw text (bytes or characters) have many benefits: They can process text in any language out of the box, they are more robust to noise, and they minimize technical debt by removing complex and error-prone text preprocessing pipelines. Because byte or character sequences are longer than token sequences, past work on token-free models has often introduced new model architectures designed to amortize the cost of operating directly on raw text. In this paper, we show that a standard Transformer architecture can be used with minimal…

Citation impact

236
total citations
FWCI
27.75
Percentile
100%
References
110
Citations per year

Authors

8

Topics & keywords

Keywords
  • Computer science
  • Byte
  • Security token
  • Language model
  • Inference
  • Transformer
  • Artificial intelligence
  • Programming language
No related works found for this paper.