articleJun 1, 2023Closed access

BiFormer: Vision Transformer with Bi-Level Routing Attention

City University of Hong Kong

Indexed incrossref

Abstract

As the core building block of vision transformers, attention is a powerful tool to capture long-range dependency. However, such power comes at a cost: it incurs a huge computation burden and heavy memory footprint as pairwise token interaction across all spatial locations is computed. A series of works attempt to alleviate this problem by introducing handcrafted and content-agnostic sparsity into attention, such as restricting the attention operation to be inside local windows, axial stripes, or dilated windows. In contrast to these approaches, we propose a novel dynamic sparse attention via bi-level routing to enable a more flexible allocation of computations with content awareness. Specifically, for a query,…

Citation impact

1,033
total citations
FWCI
117.59
Percentile
100%
References
67
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Memory footprint
  • Security token
  • Computation
  • Transformer
  • Segmentation
  • Artificial intelligence
  • Parallel computing
No related works found for this paper.