OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning
Fudan University · University of Hong Kong
Abstract
Visual object tracking aims to localize the target object of each frame based on its initial appearance in the first frame. Depending on the input modility, tracking tasks can be divided into RGB tracking and RGB+X (e.g. RGB+N, and RGB+D) tracking. Despite the different input modalities, the core aspect of tracking is the temporal matching. Based on this common ground, we present a general framework to unify various tracking tasks, termed as One Tracker. One- Tracker first performs a large-scale pre-training on a RGB tracker called Foundation Tracker. This pretraining phase equips the Foundation Tracker with a stable ability to estimate the location of the target object. Then we regard other modality…
Citation impact
- FWCI
- 24.65
- Percentile
- 100%
- References
- 140
Authors
11Topics & keywords
- Foundation (evidence)
- Computer science
- Object (grammar)
- Artificial intelligence
- Tracking (education)
- Computer vision
- Psychology
- History