VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

Wang, Xin; Wu, Jiawei; Chen, Junkun; Li, Lei; Wang, Yuan‐Fang; Wang, William Yang

doi:10.1109/iccv.2019.00468

articleOct 1, 2019Closed access

VaTeX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research

XWXin Wang JWJiawei Wu JCJunkun Chen LLLei Li YWYuan‐Fang Wang

University of California, Santa Barbara

Indexed incrossref

Abstract

We present a new large-scale multilingual video description dataset, VATEX 1 , which contains over 41,250 videos and 825, 000 captions in both English and Chinese. Among the captions, there are over 206,000 English-Chinese parallel translation pairs. Compared to the widely-used MSRVTT dataset [64], VATEX is multilingual, larger, linguistically complex, and more diverse in terms of both video and natural language descriptions. We also introduce two tasks for video-and-language research based on VATEX: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description…

Citation impact

455

total citations

FWCI: 17.25
Percentile: 100%
References: 96

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Closed captioning
Computer science
Natural language processing
Context (archaeology)
Machine translation
Artificial intelligence
Scale (ratio)
Language model

UN Sustainable Development Goals

Quality Education

No related works found for this paper.