Abstract

While BERT is an effective method for learning monolingual sentence embeddings for semantic similarity and embedding based transfer learning (Reimers and Gurevych, 2019), BERT based cross-lingual sentence embeddings have yet to be explored. We systematically investigate methods for learning multilingual sentence embeddings by combining the best methods for learning monolingual and cross-lingual representations including: masked language modeling (MLM), translation language modeling (TLM) (Conneau and Lample, 2019), dual encoder translation ranking We show that introducing a pre-trained multilingual language model dramatically reduces the amount of parallel training data required to achieve good performance by…

Citation impact

485
total citations
FWCI
40.19
Percentile
100%
References
50
Citations per year

Authors

5

Topics & keywords

Keywords
  • Computer science
  • Natural language processing
  • Sentence
  • Artificial intelligence
  • Embedding
  • Margin (machine learning)
  • Language model
  • Encoder
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.