BiFormer: Vision Transformer with Bi-Level Routing Attention

Zhu, Lei; Wang, Xinjiang; Ke, Zhanghan; Zhang, Wayne; Lau, Rynson W. H.

doi:10.1109/cvpr52729.2023.00995

articleJun 1, 2023Closed access

BiFormer: Vision Transformer with Bi-Level Routing Attention

LZLei Zhu XWXinjiang Wang ZKZhanghan Ke WZWayne Zhang RWRynson W. H. Lau

City University of Hong Kong

Indexed incrossref

Abstract

As the core building block of vision transformers, attention is a powerful tool to capture long-range dependency. However, such power comes at a cost: it incurs a huge computation burden and heavy memory footprint as pairwise token interaction across all spatial locations is computed. A series of works attempt to alleviate this problem by introducing handcrafted and content-agnostic sparsity into attention, such as restricting the attention operation to be inside local windows, axial stripes, or dilated windows. In contrast to these approaches, we propose a novel dynamic sparse attention via bi-level routing to enable a more flexible allocation of computations with content awareness. Specifically, for a query,…

Citation impact

1,033

total citations

FWCI: 117.59
Percentile: 100%
References: 67

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Memory footprint
Security token
Computation
Transformer
Segmentation
Artificial intelligence
Parallel computing

No related works found for this paper.