articleIEEE Transactions on Image ProcessingJan 1, 2022GREEN OA

End-to-End Temporal Action Detection With Transformer

Huazhong University of Science and Technology · Alibaba Group (China)

PubMed
Indexed inarxivcrossrefpubmed

Abstract

Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video. It is a fundamental and challenging task in video understanding. Previous methods tackle this task with complicated pipelines. They often need to train multiple networks and involve hand-designed operations, such as non-maximal suppression and anchor generation, which limit the flexibility and prevent end-to-end learning. In this paper, we propose an end-to-end Transformer-based method for TAD, termed TadTR. Given a small set of learnable embeddings called action queries, TadTR adaptively extracts temporal context information from the video for each query and directly…

Citation impact

259
total citations
FWCI
24.26
Percentile
100%
References
92
Citations per year

Authors

7

Topics & keywords

Keywords
  • Computer science
  • Artificial intelligence
  • Transformer
  • Locality
  • Action recognition
  • Classifier (UML)
  • Pattern recognition (psychology)
  • Machine learning
No related works found for this paper.

Funding