Contextual Transformer Networks for Visual Recognition
Indexed incrossrefpubmed
Abstract
Transformer with self-attention has led to the revolutionizing of natural language processing field, and recently inspires the emergence of Transformer-style architecture design with competitive results in numerous computer vision tasks. Nevertheless, most of existing designs directly employ self-attention over a 2D feature map to obtain the attention matrix based on pairs of isolated queries and keys at each spatial location, but leave the rich contexts among neighbor keys under-exploited. In this work, we design a novel Transformer-style module, i.e., Contextual Transformer (CoT) block, for visual recognition. Such design fully capitalizes on the contextual information among input keys to guide the learning…
Citation impact
691
total citations
- FWCI
- 65.56
- Percentile
- 100%
- References
- 98
Citations per year
Authors
4Topics & keywords
Topics
Keywords
- Computer science
- Transformer
- Artificial intelligence
- Segmentation
- Convolutional neural network
- Pattern recognition (psychology)
- Computer vision
- Engineering
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.