Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Yu, Jiahui; Xu, Yuanzhong; Koh, Jing Yu; Luong, Thang M.; Baid, Gunjan; Wang, Zirui; Vasudevan, Vijay K.; Ku, Alexander; Yang, Yinfei; Ayan, Burcu Karagol; Hutchinson, Ben; Han, Wei; Parekh, Zarana; Li, Xin; Zhang, Han; Baldridge, Jason; Wu, Yonghui

doi:10.48550/arxiv.2206.10789

preprintarXiv (Cornell University)Jun 22, 2022GREEN OA

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

JYJiahui Yu YXYuanzhong Xu JYJing Yu Koh TMThang M. Luong GBGunjan Baid

Indexed inarxivdatacite

Abstract

We present the Pathways Autoregressive Text-to-Image (Parti) model, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge. Parti treats text-to-image generation as a sequence-to-sequence modeling problem, akin to machine translation, with sequences of image tokens as the target outputs rather than text tokens in another language. This strategy can naturally tap into the rich body of prior work on large language models, which have seen continued advances in capabilities and performance through scaling data and model sizes. Our approach is simple: First, Parti uses a Transformer-based image tokenizer, ViT-VQGAN, to encode images…

Citation impact

340

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

17

Topics & keywords

Topics

Keywords

Computer science
Encoder
Autoregressive model
Transformer
Language model
Fidelity
Artificial intelligence
ENCODE

UN Sustainable Development Goals

Quality Education

No related works found for this paper.