TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

Wang, Zhong-Qiu; Cornell, Samuele; Choi, Shukjae; Lee, Younglo; Kim, Byeong-Yeol; Watanabe, Shinji

doi:10.1109/taslp.2023.3304482

articleIEEE/ACM Transactions on Audio Speech and Language ProcessingJan 1, 2023Closed access

TF-GridNet: Integrating Full- and Sub-Band Modeling for Speech Separation

ZWZhong-Qiu Wang SCSamuele Cornell SCShukjae Choi YLYounglo Lee BKByeong-Yeol Kim

Carnegie Mellon University · Marche Polytechnic University · +1 more institution

Indexed incrossref

Abstract

We propose TF-GridNet for speech separation. The model is a novel deep neural network (DNN) integrating full- and sub-band modeling in the time-frequency (T-F) domain. It stacks several blocks, each consisting of an intra-frame full-band module, a sub-band temporal module, and a cross-frame self-attention module. It is trained to perform complex spectral mapping, where the real and imaginary (RI) components of input signals are stacked as features to predict target RI components. We first evaluate it on monaural anechoic speaker separation. Without using data augmentation and dynamic mixing, it obtains a state-of-the-art 23.5 dB improvement in scale-invariant signal-to-distortion ratio (SI-SDR) on WSJ0-2mix, a…

Citation impact

173

total citations

FWCI: 32.83
Percentile: 100%
References: 99

Citations per year

Authors

6

Topics & keywords

Topics

Keywords

Computer science
Speech recognition
Reverberation
Microphone
Robustness (evolution)
Source separation
Microphone array
Monaural

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.

Funding

NS
National Science Foundation of Sri Lanka
Award: OCI 2005572