Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

Kolbæk, Morten; Yu, Dong; Tan, Zheng‐Hua; Jensen, Jesper

doi:10.1109/taslp.2017.2726762

articleIEEE/ACM Transactions on Audio Speech and Language ProcessingJul 13, 2017Closed access

Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

MKMorten Kolbæk DYDong Yu ZTZheng‐Hua Tan JJJesper Jensen

Aalborg University · Microsoft (United States) · +1 more institution

Indexed incrossref

Abstract

In this paper, we propose the utterance-level permutation invariant training (uPIT) technique. uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker independent multitalker speech separation. Specifically, uPIT extends the recently proposed permutation invariant training (PIT) technique with an utterance-level cost function, hence eliminating the need for solving an additional permutation problem during inference, which is otherwise required by frame-level PIT. We achieve this using recurrent neural networks (RNNs) that, during training, minimize the utterance-level separation error, hence forcing separated frames belonging to the same speaker to be aligned to the same output…

Citation impact

856

total citations

FWCI: 60.29
Percentile: 100%
References: 66

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Speech recognition
Computer science
Utterance
Permutation (music)
Invariant (physics)
Artificial intelligence
Artificial neural network
Inference

No related works found for this paper.

Funding

OF
Oticon Fonden