Conv-TasNet: Surpassing Ideal Time–Frequency Magnitude Masking for Speech Separation
Indexed inarxivcrossrefpubmed
Abstract
Single-channel, speaker-independent speech separation methods have recently seen great progress. However, the accuracy, latency, and computational cost of such methods remain insufficient. The majority of the previous methods have formulated the separation problem through the time-frequency representation of the mixed signal, which has several drawbacks, including the decoupling of the phase and magnitude of the signal, the suboptimality of time-frequency representation for speech separation, and the long latency of the entire system. To address these shortcomings, we propose a fully-convolutional time-domain audio separation network (Conv-TasNet), a deep learning framework for end-to-end time-domain speech…
Citation impact
2,025
total citations
- FWCI
- 152.91
- Percentile
- 100%
- References
- 81
Citations per year
Authors
2Topics & keywords
Topics
Keywords
- Magnitude (astronomy)
- Ideal (ethics)
- Masking (illustration)
- Separation (statistics)
- Speech recognition
- Computer science
- Mathematics
- Physics
UN Sustainable Development Goals
- Peace, Justice and strong institutions
No related works found for this paper.