VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research
University of California, Santa Barbara
Abstract
We present a new large-scale multilingual video description dataset, VATEX 1 , which contains over 41,250 videos and 825, 000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSRVTT dataset [64], VATEX is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. We also introduce two tasks for video-and-language research based on VATEX: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description…
Citation impact
- FWCI
- 17.25
- Percentile
- 100%
- References
- 96
Authors
6Topics & keywords
- Closed captioning
- Computer science
- Natural language processing
- Context (archaeology)
- Machine translation
- Artificial intelligence
- Scale (ratio)
- Language model
- Quality Education