Voxceleb: Large-scale speaker verification in the wild
University of Oxford · Naver (South Korea)
Abstract
The objective of this work is speaker recognition under noisy and unconstrained conditions. We make two key contributions. First, we introduce a very large-scale audio-visual dataset collected from open source media using a fully automated pipeline. Most existing datasets for speaker identification contain samples obtained under quite constrained conditions, and usually require manual annotations, hence are limited in size. We propose a pipeline based on computer vision techniques to create the dataset from open-source media. Our pipeline involves obtaining videos from YouTube; performing active speaker verification using a two-stream synchronization Convolutional Neural Network (CNN), and confirming the…
Citation impact
- FWCI
- 49.75
- Percentile
- 100%
- References
- 131
Authors
4Topics & keywords
- Computer science
- Pipeline (software)
- Convolutional neural network
- Margin (machine learning)
- Speaker recognition
- Speech recognition
- Identity (music)
- Artificial intelligence
- Quality Education