On Training Targets for Supervised Speech Separation

The Ohio State University

PubMed
Indexed incrossrefpubmed

Abstract

Formulation of speech separation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised learning algorithm, typically a deep neural network, is trained to learn a mapping from noisy features to a time-frequency representation of the target of interest. Traditionally, the ideal binary mask (IBM) is used as the target because of its simplicity and large speech intelligibility gains. The supervised learning framework, however, is not restricted to the use of binary targets. In this study, we evaluate and compare separation results by using different training targets, including the IBM, the target binary mask, the ideal ratio mask (IRM), the short-time Fourier transform…

Citation impact

1,120
total citations
FWCI
43.33
Percentile
100%
References
46
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Intelligibility (philosophy)
  • Fast Fourier transform
  • Binary number
  • Artificial intelligence
  • Speech recognition
  • Pattern recognition (psychology)
  • Speech enhancement
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.