articleProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)Jan 1, 2022HYBRID OA
Language-agnostic BERT Sentence Embedding
Indexed incrossref
Abstract
While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored. We systematically investigate methods for learning multilingual sentence embeddings by combining the best methods for learning monolingual and cross-lingual representations including: masked language modeling (MLM), translation language modeling (TLM) (Conneau and Lample, 2019), dual encoder translation ranking We show that introducing a pre-trained multilingual language model dramatically reduces the amount of parallel training data required to achieve good performance by…
Citation impact
485
total citations
- FWCI
- 40.19
- Percentile
- 100%
- References
- 50
Citations per year
Authors
5Topics & keywords
Topics
Keywords
- Computer science
- Natural language processing
- Sentence
- Artificial intelligence
- Embedding
- Margin (machine learning)
- Language model
- Encoder
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.