Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks
Aalborg University · Microsoft (United States) · +1 more institution
Abstract
In this paper, we propose the utterance-level permutation invariant training (uPIT) technique. uPIT is a practically applicable, end-to-end, deep-learning-based solution for speaker independent multitalker speech separation. Specifically, uPIT extends the recently proposed permutation invariant training (PIT) technique with an utterance-level cost function, hence eliminating the need for solving an additional permutation problem during inference, which is otherwise required by frame-level PIT. We achieve this using recurrent neural networks (RNNs) that, during training, minimize the utterance-level separation error, hence forcing separated frames belonging to the same speaker to be aligned to the same output…
Citation impact
- FWCI
- 60.29
- Percentile
- 100%
- References
- 66
Authors
4Topics & keywords
- Speech recognition
- Computer science
- Utterance
- Permutation (music)
- Invariant (physics)
- Artificial intelligence
- Artificial neural network
- Inference