article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access
MaskGIT: Masked Generative Image Transformer
Indexed incrossref
Abstract
Generative transformers have experienced rapid popularity growth in the computer vision community in synthesizing high-fidelity and high-resolution images. The best generative transformer models so far, however, still treat an image naively as a sequence of tokens, and decode an image sequentially following the raster scan ordering (i.e. line-by-line). We find this strategy neither optimal nor efficient. This paper proposes a novel image synthesis paradigm using a bidirectional transformer decoder, which we term MaskGIT. During training, MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions. At inference time, the model begins with generating all tokens of an image…
Citation impact
359
total citations
- FWCI
- 19.18
- Percentile
- 100%
- References
- 70
Citations per year
Authors
5Topics & keywords
Topics
Keywords
- Computer science
- Transformer
- Artificial intelligence
- Generative grammar
- Image warping
- Generative model
- Inference
- Decoding methods
No related works found for this paper.