YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

Real, Esteban; Shlens, Jonathon; Mazzocchi, Stefano; Pan, Xin; Vanhoucke, Vincent

doi:10.1109/cvpr.2017.789

articleJul 1, 2017Closed access

YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video

EREsteban Real JSJonathon Shlens SMStefano Mazzocchi XPXin Pan VVVincent Vanhoucke

Google (United States)

Indexed incrossref

Abstract

We introduce a new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB). The data set consists of approximately 380,000 video segments about 19s long, automatically selected to feature objects in natural settings without editing or post-processing, with a recording quality often akin to that of a hand-held cell phone camera. The objects represent a subset of the COCO [32] label set. All video segments were human-annotated with high-precision classification labels and bounding boxes at 1 frame per second. The use of a cascade of increasingly precise human annotations ensures a label accuracy above 95% for every class and tight bounding…

Citation impact

564

total citations

FWCI: 15.16
Percentile: 100%
References: 62

Citations per year

Authors

5

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Minimum bounding box
Data set
Set (abstract data type)
Video tracking
Frame (networking)
Object (grammar)

No related works found for this paper.