OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

Hong, Lingyi; Yan, Shilin; Zhang, Renrui; Li, Wanyun; Zhou, Xinyu; Guo, Pinxue; Jiang, Kaixun; Chen, Yi-Ting; Li, Jinglun; Chen, Zhaoyu; Zhang, Wenqiang

doi:10.1109/cvpr52733.2024.01805

articleJun 16, 2024Closed access

OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning

LHLingyi Hong SYShilin Yan RZRenrui Zhang WLWanyun Li XZXinyu Zhou

Fudan University · University of Hong Kong

Indexed incrossref

Abstract

Visual object tracking aims to localize the target object of each frame based on its initial appearance in the first frame. Depending on the input modility, tracking tasks can be divided into RGB tracking and RGB+X (e.g. RGB+N, and RGB+D) tracking. Despite the different input modalities, the core aspect of tracking is the temporal matching. Based on this common ground, we present a general framework to unify various tracking tasks, termed as One Tracker. One- Tracker first performs a large-scale pre-training on a RGB tracker called Foundation Tracker. This pretraining phase equips the Foundation Tracker with a stable ability to estimate the location of the target object. Then we regard other modality…

Citation impact

110

total citations

FWCI: 24.65
Percentile: 100%
References: 140

Citations per year

Authors

11

Topics & keywords

Topics

Keywords

Foundation (evidence)
Computer science
Object (grammar)
Artificial intelligence
Tracking (education)
Computer vision
Psychology
History

No related works found for this paper.