articleJun 1, 2016Closed access
MSR-VTT: A Large Video Description Dataset for Bridging Video and Language
Microsoft Research Asia (China)
Indexed incrossref
Abstract
While there has been increasing interest in the task of describing video with natural language, current computer vision algorithms are still severely limited in terms of the variability and complexity of the videos and their associated language that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on specific fine-grained domains with limited videos and simple descriptions. While researchers have provided several benchmark datasets for image captioning, we are not aware of any large-scale video description dataset with comprehensive categories yet diverse video content. In this paper we present MSR-VTT (standing for "MSRVideo to Text") which is a new…
Citation impact
1,735
total citations
- FWCI
- 49.67
- Percentile
- 100%
- References
- 63
Citations per year
Authors
4Topics & keywords
Topics
Keywords
- Computer science
- Automatic summarization
- Closed captioning
- Bridging (networking)
- Vocabulary
- Sentence
- Benchmark (surveying)
- Task (project management)
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.