TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking

Chu, Peng; Wang, Jiang; You, Quanzeng; Ling, Haibin; Liu, Zicheng

doi:10.1109/wacv56688.2023.00485

article2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)Jan 1, 2023GREEN OA

TransMOT: Spatial-Temporal Graph Transformer for Multiple Object Tracking

PCPeng Chu JWJiang Wang QYQuanzeng You HLHaibin Ling ZLZicheng Liu

Microsoft Research (United Kingdom) · Stony Brook University

Indexed inarxivcrossref

Abstract

Tracking multiple objects in videos relies on modeling the spatial-temporal interactions of the objects. In this paper, we propose TransMOT, which leverages powerful graph transformers to efficiently model the spatial and temporal interactions among the objects. TransMOT is capable of effectively modeling the interactions of a large number of objects by arranging the trajectories of the tracked targets and detection candidates as a set of sparse weighted graphs, and constructing a spatial graph transformer encoder layer, a temporal transformer encoder layer, and a spatial graph transformer decoder layer based on the graphs. Through end-to-end learning, TransMOT can exploit the spatial-temporal clues to…

Citation impact

236

total citations

FWCI: 12.84
Percentile: 100%
References: 87

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Transformer
Graph
Artificial intelligence
Video tracking
Computer vision
Object (grammar)
Theoretical computer science

No related works found for this paper.

Funding

NS
National Science Foundation