Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Indexed inarxivdatacite
Abstract
We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge. Parti treats text-to-image generation as a sequence-to-sequence modeling problem, akin to machine translation, with sequences of image tokens as the target outputs rather than text tokens in another language. This strategy can naturally tap into the rich body of prior work on large language models, which have seen continued advances in capabilities and performance through scaling data and model sizes. Our approach is simple: First, Parti uses a Transformer-based image tokenizer, ViT-VQGAN, to encode images…
Citation impact
340
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
17Topics & keywords
Topics
Keywords
- Computer science
- Encoder
- Autoregressive model
- Transformer
- Language model
- Fidelity
- Artificial intelligence
- ENCODE
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.