ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models

Xue, Linting; Barua, Aditya; Constant, Noah; Al‐Rfou, Rami; Narang, Sharan; Kale, Mihir; Roberts, Adam P.; Raffel, Colin

doi:10.1162/tacl_a_00461

articleTransactions of the Association for Computational LinguisticsJan 1, 2022DIAMOND OA

ByT5: Towards a Token-Free Future with Pre-trained Byte-to-Byte Models

LXLinting Xue ABAditya Barua NCNoah Constant RARami Al‐Rfou SNSharan Narang

Google (United States)

Indexed incrossrefdoaj

Abstract

Abstract Most widely used pre-trained language models operate on sequences of tokens corresponding to word or subword units. By comparison, token-free models that operate directly on raw text (bytes or characters) have many benefits: They can process text in any language out of the box, they are more robust to noise, and they minimize technical debt by removing complex and error-prone text preprocessing pipelines. Because byte or character sequences are longer than token sequences, past work on token-free models has often introduced new model architectures designed to amortize the cost of operating directly on raw text. In this paper, we show that a standard Transformer architecture can be used with minimal…

Citation impact

236

total citations

FWCI: 27.75
Percentile: 100%
References: 110

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Byte
Security token
Language model
Inference
Transformer
Artificial intelligence
Programming language

No related works found for this paper.