Permutation invariant training of deep models for speaker-independent multi-talker speech separation

Yu, Dong; Kolbæk, Morten; Tan, Zheng‐Hua; Jensen, Jesper

doi:10.1109/icassp.2017.7952154

articleMar 1, 2017Closed access

Permutation invariant training of deep models for speaker-independent multi-talker speech separation

DYDong Yu MKMorten Kolbæk ZTZheng‐Hua Tan JJJesper Jensen

Microsoft (United States) · Aalborg University

Indexed incrossref

Abstract

We propose a novel deep learning training criterion, named permutation invariant training (PIT), for speaker independent multi-talker speech separation, commonly known as the cocktail-party problem. Different from the multi-class regression technique and the deep clustering (DPCL) technique, our novel approach minimizes the separation error directly. This strategy effectively solves the long-lasting label permutation problem, that has prevented progress on deep learning based techniques for speech separation. We evaluated PIT on the WSJ0 and Danish mixed-speech separation tasks and found that it compares favorably to non-negative matrix factorization (NMF), computational auditory scene analysis (CASA), and…

Citation impact

873

total citations

FWCI: 63.77
Percentile: 100%
References: 30

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Speech recognition
Non-negative matrix factorization
Permutation (music)
Artificial intelligence
Cluster analysis
Invariant (physics)
Deep learning

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.

Funding

ME
Mitsubishi Electric Research Laboratories