articleMar 1, 2017Closed access

Permutation invariant training of deep models for speaker-independent multi-talker speech separation

Microsoft (United States) · Aalborg University

Indexed incrossref

Abstract

We propose a novel deep learning training criterion, named permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from the multi-class regression technique and the deep clustering (DPCL) technique, our novel approach minimizes the separation error directly. This strategy effectively solves the long-lasting label permutation problem, that has prevented progress on deep learning based techniques for speech separation. We evaluated PIT on the WSJ0 and Danish mixed-speech separation tasks and found that it compares favorably to non-negative matrix factorization (NMF), computational auditory scene analysis (CASA), and…

Citation impact

873
total citations
FWCI
63.77
Percentile
100%
References
30
Citations per year

Authors

4

Topics & keywords

Keywords
  • Computer science
  • Speech recognition
  • Non-negative matrix factorization
  • Permutation (music)
  • Artificial intelligence
  • Cluster analysis
  • Invariant (physics)
  • Deep learning
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.

Funding