Voxceleb: Large-scale speaker verification in the wild

Nagrani, Arsha; Chung, Joon Son; Xie, Weidi; Zisserman, Andrew

doi:10.1016/j.csl.2019.101027

articleComputer Speech & LanguageOct 16, 2019HYBRID OA

Voxceleb: Large-scale speaker verification in the wild

ANArsha Nagrani JSJoon Son Chung WXWeidi Xie AZAndrew Zisserman

University of Oxford · Naver (South Korea)

Indexed incrossref

Abstract

The objective of this work is speaker recognition under noisy and unconstrained conditions. We make two key contributions. First, we introduce a very large-scale audio-visual dataset collected from open source media using a fully automated pipeline. Most existing datasets for speaker identification contain samples obtained under quite constrained conditions, and usually require manual annotations, hence are limited in size. We propose a pipeline based on computer vision techniques to create the dataset from open-source media. Our pipeline involves obtaining videos from YouTube; performing active speaker verification using a two-stream synchronization Convolutional Neural Network (CNN), and confirming the…

Citation impact

647

total citations

FWCI: 49.75
Percentile: 100%
References: 131

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Pipeline (software)
Convolutional neural network
Margin (machine learning)
Speaker recognition
Speech recognition
Identity (music)
Artificial intelligence

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

EA
Engineering and Physical Sciences Research Council
Award: EP/M013774/1