WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Harbin Institute of Technology · Nankai University · +3 more institutions
Abstract
Self-supervised learning (SSL) achieves great success in speech recognition, while limited exploration has been attempted for other speech processing tasks. As speech signal contains multi-faceted information including speaker identity, paralinguistics, spoken content, etc., learning universal representations for all speech tasks is challenging. To tackle the problem, we propose a new pre-trained model, WavLM, to solve full-stack downstream speech tasks. WavLM jointly learns masked speech prediction and denoising in pre-training. By this means, WavLM does not only keep the speech content modeling capability by the masked speech prediction, but also improves the potential to non-ASR tasks by the speech…
Citation impact
- FWCI
- 201.99
- Percentile
- 100%
- References
- 133
Authors
19Topics & keywords
- Computer science
- Speech recognition
- Speech processing
- Voice activity detection
- Speech enhancement
- Speech coding
- Sequence labeling
- Benchmark (surveying)
- Quality Education