On Training Targets for Supervised Speech Separation

Wang, Yuxuan; Narayanan, Arun; Wang, DeLiang

doi:10.1109/taslp.2014.2352935

articleIEEE/ACM Transactions on Audio Speech and Language ProcessingAug 28, 2014Closed access

On Training Targets for Supervised Speech Separation

YWYuxuan Wang ANArun Narayanan DWDeLiang Wang

The Ohio State University

PubMed

Indexed incrossrefpubmed

Abstract

Formulation of speech separation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised learning algorithm, typically a deep neural network, is trained to learn a mapping from noisy features to a time-frequency representation of the target of interest. Traditionally, the ideal binary mask (IBM) is used as the target because of its simplicity and large speech intelligibility gains. The supervised learning framework, however, is not restricted to the use of binary targets. In this study, we evaluate and compare separation results by using different training targets, including the IBM, the target binary mask, the ideal ratio mask (IRM), the short-time Fourier transform…

Citation impact

1,120

total citations

FWCI: 43.33
Percentile: 100%
References: 46

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science
Intelligibility (philosophy)
Fast Fourier transform
Binary number
Artificial intelligence
Speech recognition
Pattern recognition (psychology)
Speech enhancement

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.