On the Integration of Self-Attention and Convolution
Tsinghua University · Huawei Technologies (China) · +1 more institution
Abstract
Convolution and self-attention are two powerful techniques for representation learning, and they are usually considered as two peer approaches that are distinct from each other. In this paper, we show that there exists a strong underlying relation between them, in the sense that the bulk of computations of these two paradigms are in fact done with the same operation. Specifically, we first show that a traditional convolution with kernel size k × k can be decomposed into k 2 individual 1 × 1 convolutions, followed by shift and summation operations. Then, we interpret the projections of queries, keys, and values in self-attention module as multiple 1 × 1 convolutions, followed by the computation of attention…
Citation impact
- FWCI
- 28.50
- Percentile
- 100%
- References
- 79
Authors
7Topics & keywords
- Convolution (computer science)
- Computer science
- Kernel (algebra)
- Computation
- Overhead (engineering)
- Representation (politics)
- Code (set theory)
- Theoretical computer science