Self-Supervised Speech Representation Learning: A Review
Menlo School · National Taiwan University · +6 more institutions
Abstract
Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Such methods have shown success in natural language processing and computer vision domains, achieving new levels of performance while reducing the number of labels required for many downstream scenarios. Speech representation learning is experiencing similar progress in three main…
Citation impact
- FWCI
- 42.59
- Percentile
- 100%
- References
- 377
Authors
12Topics & keywords
- Computer science
- Artificial intelligence
- Feature learning
- Natural language processing
- Speech processing
- Keyword spotting
- Multi-task learning
- Supervised learning
- Quality Education