Scaling up GANs for Text-to-Image Synthesis

Kang, Minguk; Zhu, Jun-Yan; Zhang, Richard; Park, Jaesik; Shechtman, Eli; Paris, Sylvain; Park, Taesung

doi:10.1109/cvpr52729.2023.00976

articleJun 1, 2023Closed access

Scaling up GANs for Text-to-Image Synthesis

MKMinguk Kang JZJun-Yan Zhu RZRichard Zhang JPJaesik Park ESEli Shechtman

Pohang University of Science and Technology · Adobe Systems (United States) · +2 more institutions

Indexed incrossref

Abstract

The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL.E 2, autoregressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that naïvely increasing the capacity of the StyleGan architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit,…

Citation impact

364

total citations

FWCI: 41.36
Percentile: 100%
References: 141

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Image (mathematics)
Autoregressive model
Interpolation (computer graphics)
Inference
Generative model
Architecture
Generative grammar

UN Sustainable Development Goals

Sustainable cities and communities

No related works found for this paper.