Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks

Aalborg University · Microsoft (United States) · +1 more institution

Indexed incrossref

Abstract

In this paper, we propose the utterance-level permutation invariant training (uPIT) technique. uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker independent multitalker speech separation. Specifically, uPIT extends the recently proposed permutation invariant training (PIT) technique with an utterance-level cost function, hence eliminating the need for solving an additional permutation problem during inference, which is otherwise required by frame-level PIT. We achieve this using recurrent neural networks (RNNs) that, during training, minimize the utterance-level separation error, hence forcing separated frames belonging to the same speaker to be aligned to the same output…

Citation impact

856
total citations
FWCI
60.29
Percentile
100%
References
66
Citations per year

Authors

4

Topics & keywords

Keywords
  • Speech recognition
  • Computer science
  • Utterance
  • Permutation (music)
  • Invariant (physics)
  • Artificial intelligence
  • Artificial neural network
  • Inference
No related works found for this paper.

Funding