End-to-End Temporal Action Detection With Transformer
Huazhong University of Science and Technology · Alibaba Group (China)
Abstract
Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video. It is a fundamental and challenging task in video understanding. Previous methods tackle this task with complicated pipelines. They often need to train multiple networks and involve hand-designed operations, such as non-maximal suppression and anchor generation, which limit the flexibility and prevent end-to-end learning. In this paper, we propose an end-to-end Transformer-based method for TAD, termed TadTR. Given a small set of learnable embeddings called action queries, TadTR adaptively extracts temporal context information from the video for each query and directly…
Citation impact
- FWCI
- 24.26
- Percentile
- 100%
- References
- 92
Authors
7Topics & keywords
- Computer science
- Artificial intelligence
- Transformer
- Locality
- Action recognition
- Classifier (UML)
- Pattern recognition (psychology)
- Machine learning