XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

Babu, Arun; Wang, Changhan; Tjandra, Andros; Lakhotia, Kushal; Xu, Qiantong; Goyal, Naman; Singh, Kritika; Platen, Patrick von; Saraf, Yatharth; Pino, Juan; Baevski, Alexei; Conneau, Alexis; Auli, Michael

doi:10.21437/interspeech.2022-143

articleInterspeech 2022Sep 16, 2022Closed access

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

ABArun Babu CWChanghan Wang ATAndros Tjandra KLKushal Lakhotia QXQiantong Xu

Indexed incrossref

Abstract

This paper presents XLS-R, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0.We train models with up to 2B parameters on nearly half a million hours of publicly available speech audio in 128 languages, an order of magnitude more public data than the largest known prior work.Our evaluation covers a wide range of tasks, domains, data regimes and languages, both high and low-resource.On the CoVoST-2 speech translation benchmark, we improve the previous state of the art by an average of 7.4 BLEU over 21 translation directions into English.For speech recognition, XLS-R improves over the best known prior work on BABEL, MLS, CommonVoice as well as VoxPopuli, lowering error…

Citation impact

512

total citations

FWCI: 48.81
Percentile: 100%
References: 84

Citations per year

Authors

13

Topics & keywords

Topics

Keywords

Computer science
Scale (ratio)
Speech recognition
Representation (politics)
Artificial intelligence
Natural language processing
Pattern recognition (psychology)
Physics

UN Sustainable Development Goals

Quality Education

No related works found for this paper.