articleComputer Speech & LanguageOct 16, 2019HYBRID OA

Voxceleb: Large-scale speaker verification in the wild

University of Oxford · Naver (South Korea)

Indexed incrossref

Abstract

The objective of this work is speaker recognition under noisy and unconstrained conditions. We make two key contributions. First, we introduce a very large-scale audio-visual dataset collected from open source media using a fully automated pipeline. Most existing datasets for speaker identification contain samples obtained under quite constrained conditions, and usually require manual annotations, hence are limited in size. We propose a pipeline based on computer vision techniques to create the dataset from open-source media. Our pipeline involves obtaining videos from YouTube; performing active speaker verification using a two-stream synchronization Convolutional Neural Network (CNN), and confirming the…

Citation impact

647
total citations
FWCI
49.75
Percentile
100%
References
131
Citations per year

Authors

4

Topics & keywords

Keywords
  • Computer science
  • Pipeline (software)
  • Convolutional neural network
  • Margin (machine learning)
  • Speaker recognition
  • Speech recognition
  • Identity (music)
  • Artificial intelligence
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding