CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

Dong, Xiaoyi; Bao, Jianmin; Chen, Dongdong; Zhang, Weiming; Yu, Nenghai; Yuan, Lu; Chen, Dong; Guo, Baining

doi:10.1109/cvpr52688.2022.01181

article2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)Jun 1, 2022Closed access

CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows

XDXiaoyi Dong JBJianmin Bao DCDongdong Chen WZWeiming Zhang NYNenghai Yu

University of Science and Technology of China · Microsoft Research Asia (China) · +1 more institution

Indexed incrossref

Abstract

We present CSWin Transformer, an efficient and effective Transformer-based backbone for general-purpose vision tasks. A challenging issue in Transformer design is that global self-attention is very expensive to compute whereas local self-attention often limits the field of interactions of each token. To address this issue, we develop the Cross-Shaped Window self-attention mechanism for computing self-attention in the horizontal and vertical stripes in parallel that form a cross-shaped window, with each stripe obtained by splitting the input feature into stripes of equal width. We provide a mathematical analysis of the effect of the stripe width and vary the stripe width for different layers of the Transformer…

Citation impact

1,218

total citations

FWCI: 369.46
Percentile: 100%
References: 83

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Transformer
Computer science
Computation
Segmentation
FLOPS
Artificial intelligence
Limiting
Algorithm

No related works found for this paper.