Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation

Luo, Yi; Mesgarani, Nima

doi:10.1109/taslp.2019.2915167

articleIEEE/ACM Transactions on Audio Speech and Language ProcessingMay 7, 2019BRONZE OA

Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation

YLYi Luo NMNima Mesgarani

Columbia University

PubMed

Indexed inarxivcrossrefpubmed

Abstract

Single-channel, speaker-independent speech separation methods have recently seen great progress. However, the accuracy, latency, and computational cost of such methods remain insufficient. The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency of the entire system. To address these shortcomings, we propose a fully-convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time-domain speech…

Citation impact

2,025

total citations

FWCI: 152.91
Percentile: 100%
References: 81

Citations per year

Authors

2

Topics & keywords

Topics

Keywords

Magnitude (astronomy)
Ideal (ethics)
Masking (illustration)
Separation (statistics)
Speech recognition
Computer science
Mathematics
Physics

UN Sustainable Development Goals

Peace, Justice and strong institutions

No related works found for this paper.