End-to-End Temporal Action Detection With Transformer

Liu, Xiaolong; Wang, Qimeng; Hu, Yao; Tang, Xu; Zhang, Shiwei; Bai, Song; Bai, Xiang

doi:10.1109/tip.2022.3195321

articleIEEE Transactions on Image ProcessingJan 1, 2022GREEN OA

End-to-End Temporal Action Detection With Transformer

XLXiaolong Liu QWQimeng Wang YHYao Hu XTXu Tang SZShiwei Zhang

Huazhong University of Science and Technology · Alibaba Group (China)

PubMed

Indexed inarxivcrossrefpubmed

Abstract

Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video. It is a fundamental and challenging task in video understanding. Previous methods tackle this task with complicated pipelines. They often need to train multiple networks and involve hand-designed operations, such as non-maximal suppression and anchor generation, which limit the flexibility and prevent end-to-end learning. In this paper, we propose an end-to-end Transformer-based method for TAD, termed TadTR. Given a small set of learnable embeddings called action queries, TadTR adaptively extracts temporal context information from the video for each query and directly…

Citation impact

259

total citations

FWCI: 24.26
Percentile: 100%
References: 92

Citations per year

Authors

7

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Transformer
Locality
Action recognition
Classifier (UML)
Pattern recognition (psychology)
Machine learning

No related works found for this paper.

Funding

NK
National Key Research and Development Program of China
Award: 2018YFB1004600