MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

Xu, Jun; Mei, Tao; Yao, Ting; Rui, Yong

doi:10.1109/cvpr.2016.571

articleJun 1, 2016Closed access

MSR-VTT: A Large Video Description Dataset for Bridging Video and Language

JXJun Xu TMTao Mei TYTing Yao YRYong Rui

Microsoft Research Asia (China)

Indexed incrossref

Abstract

While there has been increasing interest in the task of describing video with natural language, current computer vision algorithms are still severely limited in terms of the variability and complexity of the videos and their associated language that they can recognize. This is in part due to the simplicity of current benchmarks, which mostly focus on specific fine-grained domains with limited videos and simple descriptions. While researchers have provided several benchmark datasets for image captioning, we are not aware of any large-scale video description dataset with comprehensive categories yet diverse video content. In this paper we present MSR-VTT (standing for "MSRVideo to Text") which is a new…

Citation impact

1,735

total citations

FWCI: 49.67
Percentile: 100%
References: 63

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Automatic summarization
Closed captioning
Bridging (networking)
Vocabulary
Sentence
Benchmark (surveying)
Task (project management)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.