On the Integration of Self-Attention and Convolution

Tsinghua University · Huawei Technologies (China) · +1 more institution

Indexed incrossref

Abstract

Convolution and self-attention are two powerful techniques for representation learning, and they are usually considered as two peer approaches that are distinct from each other. In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation. Specifically, we first show that a traditional convolution with kernel size k × k can be decomposed into k 2 individual 1 × 1 convolutions, followed by shift and summation operations. Then, we interpret the projections of queries, keys, and values in self-attention module as multiple 1 × 1 convolutions, followed by the computation of attention…

Citation impact

521
total citations
FWCI
28.50
Percentile
100%
References
79
Citations per year

Authors

7

Topics & keywords

Keywords
  • Convolution (computer science)
  • Computer science
  • Kernel (algebra)
  • Computation
  • Overhead (engineering)
  • Representation (politics)
  • Code (set theory)
  • Theoretical computer science
No related works found for this paper.