Self-Supervised Speech Representation Learning: A Review

Mohamed, Abdelrahman; Lee, Hung-yi; Borgholt, Lasse; Havtorn, Jakob D.; Edin, Joakim; Igel, Christian; Kirchhoff, Katrin; Li, Shang-Wen; Livescu, Karen; Maaløe, Lars; Sainath, Tara N.; Watanabe, Shinji

doi:10.1109/jstsp.2022.3207050

reviewIEEE Journal of Selected Topics in Signal ProcessingSep 15, 2022GREEN OA

Self-Supervised Speech Representation Learning: A Review

AMAbdelrahman Mohamed HLHung-yi Lee LBLasse Borgholt JDJakob D. Havtorn JEJoakim Edin

Menlo School · National Taiwan University · +6 more institutions

Indexed inarxivcrossref

Abstract

Although supervised deep learning has revolutionized speech and audio processing, it has necessitated the building of specialist models for individual tasks and application scenarios. It is likewise difficult to apply this to dialects and languages for which only limited labeled data is available. Self-supervised representation learning methods promise a single universal model that would benefit a wide variety of tasks and domains. Such methods have shown success in natural language processing and computer vision domains, achieving new levels of performance while reducing the number of labels required for many downstream scenarios. Speech representation learning is experiencing similar progress in three main…

Citation impact

337

total citations

FWCI: 42.59
Percentile: 100%
References: 377

Citations per year

Authors

12

Topics & keywords

Topics

Keywords

Computer science
Artificial intelligence
Feature learning
Natural language processing
Speech processing
Keyword spotting
Multi-task learning
Supervised learning

UN Sustainable Development Goals

Quality Education

No related works found for this paper.