MaskGIT: Masked Generative Image Transformer

Chang, Huiwen; Zhang, Han; Jiang, Lu; Liu, Ce; Freeman, William T.

doi:10.1109/cvpr52688.2022.01103

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

MaskGIT: Masked Generative Image Transformer

HCHuiwen Chang HZHan Zhang LJLu Jiang CLCe Liu WTWilliam T. Freeman

Google (United States)

Indexed incrossref

Abstract

Generative transformers have experienced rapid popularity growth in the computer vision community in synthesizing high-fidelity and high-resolution images. The best generative transformer models so far, however, still treat an image naively as a sequence of tokens, and decode an image sequentially following the raster scan ordering (i.e. line-by-line). We find this strategy neither optimal nor efficient. This paper proposes a novel image synthesis paradigm using a bidirectional transformer decoder, which we term MaskGIT. During training, MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions. At inference time, the model begins with generating all tokens of an image…

Citation impact

359

total citations

FWCI: 19.18
Percentile: 100%
References: 70

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Transformer
Artificial intelligence
Generative grammar
Image warping
Generative model
Inference
Decoding methods

No related works found for this paper.