articleOct 1, 2019Closed access

VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

University of California, Santa Barbara

Indexed incrossref

Abstract

We present a new large-scale multilingual video description dataset, VATEX 1 , which contains over 41,250 videos and 825, 000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSRVTT dataset [64], VATEX is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. We also introduce two tasks for video-and-language research based on VATEX: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description…

Citation impact

455
total citations
FWCI
17.25
Percentile
100%
References
96
Citations per year

Authors

6

Topics & keywords

Keywords
  • Closed captioning
  • Computer science
  • Natural language processing
  • Context (archaeology)
  • Machine translation
  • Artificial intelligence
  • Scale (ratio)
  • Language model
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.