articleJan 1, 2020GOLD OA

On the Sentence Embeddings from Pre-trained Language Models

Carnegie Mellon University

Indexed incrossref

Abstract

Pre-trained contextual representations like BERT have achieved great success in natural language processing. However, the sentence embeddings from the pre-trained language models without fine-tuning have been found to poorly capture semantic meaning of sentences. In this paper, we argue that the semantic information in the BERT embeddings is not fully exploited. We first reveal the theoretical connection between the masked language model pre-training objective and the semantic similarity task theoretically, and then analyze the BERT sentence embeddings empirically. We find that BERT always induces a non-smooth anisotropic semantic space of sentences, which harms its performance of semantic similarity. To…

Citation impact

539
total citations
FWCI
51.27
Percentile
100%
References
45
Citations per year

Authors

6

Topics & keywords

Keywords
  • Computer science
  • Natural language processing
  • Sentence
  • Semantic similarity
  • Artificial intelligence
  • Embedding
  • Similarity (geometry)
  • Language model
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.