Speaker Diarization: A Review of Recent Research

Anguera, Xavier; Bozonnet, Simon; Evans, Nicholas; Fredouille, Corinne; Friedland, Gerald; Vinyals, Oriol

doi:10.1109/tasl.2011.2125954

reviewIEEE Transactions on Audio Speech and Language ProcessingJan 31, 2012Closed access

Speaker Diarization: A Review of Recent Research

XAXavier Anguera SBSimon Bozonnet NENicholas Evans CFCorinne Fredouille GFGerald Friedland

Telefónica (Spain) · EURECOM · +3 more institutions

Indexed incrossref

Abstract

Speaker diarization is the task of determining “who spoke when?” in an audio or video recording that contains an unknown amount of speech and also an unknown number of speakers. Initially, it was proposed as a research topic related to automatic speech recognition, where speaker diarization serves as an upstream processing step. Over recent years, however, speaker diarization has become an important key technology for many tasks, such as navigation, retrieval, or higher level inference on audio data. Accordingly, many important improvements in accuracy and robustness have been reported in journals and conferences in the area. The application domains, from broadcast news, to lectures and meetings, vary greatly…

Citation impact

678

total citations

FWCI: 32.19
Percentile: 100%
References: 165

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Speaker diarisation
Computer science
NIST
Transcription (linguistics)
Speech recognition
Speaker recognition
Inference
Speech technology

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

NI
National Institute of Standards and Technology