Permutation invariant training of deep models for speaker-independent multi-talker speech separation
Microsoft (United States) · Aalborg University
Abstract
We propose a novel deep learning training criterion, named permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from the multi-class regression technique and the deep clustering (DPCL) technique, our novel approach minimizes the separation error directly. This strategy effectively solves the long-lasting label permutation problem, that has prevented progress on deep learning based techniques for speech separation. We evaluated PIT on the WSJ0 and Danish mixed-speech separation tasks and found that it compares favorably to non-negative matrix factorization (NMF), computational auditory scene analysis (CASA), and…
Citation impact
- FWCI
- 63.77
- Percentile
- 100%
- References
- 30
Authors
4Topics & keywords
- Computer science
- Speech recognition
- Non-negative matrix factorization
- Permutation (music)
- Artificial intelligence
- Cluster analysis
- Invariant (physics)
- Deep learning
- Peace, Justice and strong institutions