MaskGIT: Masked Generative Image Transformer

Google (United States)

Indexed incrossref

Abstract

Generative transformers have experienced rapid popularity growth in the computer vision community in synthesizing high-fidelity and high-resolution images. The best generative transformer models so far, however, still treat an image naively as a sequence of tokens, and decode an image sequentially following the raster scan ordering (i.e. line-by-line). We find this strategy neither optimal nor efficient. This paper proposes a novel image synthesis paradigm using a bidirectional transformer decoder, which we term MaskGIT. During training, MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions. At inference time, the model begins with generating all tokens of an image…

Citation impact

359
total citations
FWCI
19.18
Percentile
100%
References
70
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Transformer
  • Artificial intelligence
  • Generative grammar
  • Image warping
  • Generative model
  • Inference
  • Decoding methods
No related works found for this paper.