Visual Prompt Multi-Modal Tracking
Dalian University of Technology · Peng Cheng Laboratory
Abstract
Visible-modal object tracking gives rise to a series of downstream multi-modal tracking tributaries. To inherit the powerful representations of the foundation model, a natural modus operandi for multi-modal tracking is full fine-tuning on the RGB-based parameters. Albeit effective, this manner is not optimal due to the scarcity of downstream data and poor transferability, etc. In this paper, inspired by the recent success of the prompt learning in language models, we develop Visual Prompt multi-modal Tracking (ViPT), which learns the modal-relevant prompts to adapt the frozen pre-trained foundation model to various downstream multi-modal tracking tasks. ViPT finds a better way to stimulate the knowledge of the…
Citation impact
- FWCI
- 33.62
- Percentile
- 100%
- References
- 91
Authors
5Topics & keywords
- Computer science
- Modal
- RGB color model
- Artificial intelligence
- Tracking (education)
- Downstream (manufacturing)
- Computer vision
- Eye tracking
- No poverty