TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation
Carnegie Mellon University · Marche Polytechnic University · +1 more institution
Abstract
We propose TF-GridNet for speech separation. The model is a novel deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several blocks, each consisting of an intra-frame full-band module, a sub-band temporal module, and a cross-frame self-attention module. It is trained to perform complex spectral mapping, where the real and imaginary (RI) components of input signals are stacked as features to predict target RI components. We first evaluate it on monaural anechoic speaker separation. Without using data augmentation and dynamic mixing, it obtains a state-of-the-art 23.5 dB improvement in scale-invariant signal-to-distortion ratio (SI-SDR) on WSJ0-2mix, a…
Citation impact
- FWCI
- 32.83
- Percentile
- 100%
- References
- 99
Authors
6Topics & keywords
- Computer science
- Speech recognition
- Reverberation
- Microphone
- Robustness (evolution)
- Source separation
- Microphone array
- Monaural
- Peace, Justice and strong institutions