TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

Carnegie Mellon University · Marche Polytechnic University · +1 more institution

Indexed incrossref

Abstract

We propose TF-GridNet for speech separation. The model is a novel deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several blocks, each consisting of an intra-frame full-band module, a sub-band temporal module, and a cross-frame self-attention module. It is trained to perform complex spectral mapping, where the real and imaginary (RI) components of input signals are stacked as features to predict target RI components. We first evaluate it on monaural anechoic speaker separation. Without using data augmentation and dynamic mixing, it obtains a state-of-the-art 23.5 dB improvement in scale-invariant signal-to-distortion ratio (SI-SDR) on WSJ0-2mix, a…

Citation impact

173
total citations
FWCI
32.83
Percentile
100%
References
99
Citations per year

Authors

6

Topics & keywords

Keywords
  • Computer science
  • Speech recognition
  • Reverberation
  • Microphone
  • Robustness (evolution)
  • Source separation
  • Microphone array
  • Monaural
UN Sustainable Development Goals
  • Peace, Justice and strong institutions
No related works found for this paper.

Funding