Speaker adaptation of neural network acoustic models using i-vectors

Saon, George; Soltau, Hagen; Nahamoo, D.; Picheny, Michael

doi:10.1109/asru.2013.6707705

articleDec 1, 2013Closed access

Speaker adaptation of neural network acoustic models using i-vectors

GSGeorge Saon HSHagen Soltau DND. Nahamoo MPMichael Picheny

IBM (United States)

Indexed incrossref

Abstract

We propose to adapt deep neural network (DNN) acoustic models to a target speaker by supplying speaker identity vectors (i-vectors) as input features to the network in parallel with the regular acoustic features for ASR. For both training and test, the i-vector for a given speaker is concatenated to every frame belonging to that speaker and changes across different speakers. Experimental results on a Switchboard 300 hours corpus show that DNNs trained on speaker independent features and i-vectors achieve a 10% relative improvement in word error rate (WER) over networks trained on speaker independent features only. These networks are comparable in performance to DNNs trained on speaker-adapted features (with…

Citation impact

606

total citations

FWCI: 95.91
Percentile: 100%
References: 24

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Speech recognition
Speaker recognition
Artificial neural network
Word error rate
Speaker diarisation
Decoding methods
Identity (music)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.

Funding

VU
Vysoké Učení Technické v Brně