preprintJan 1, 2019GOLD OA

What Does BERT Learn about the Structure of Language?

Institut national de recherche en informatique et en automatique · Sorbonne Université

Indexed incrossref

Abstract

BERT is a recent language representation model that has surprisingly performed well in diverse language understanding benchmarks. This result indicates the possibility that BERT networks capture structural information about language. In this work, we provide novel support for this claim by performing a series of experiments to unpack the elements of English language structure learned by BERT. We first show that BERT's phrasal representation captures phrase-level information in the lower layers. We also show that BERT's intermediate layers encode a rich hierarchy of linguistic information, with surface features at the bottom, syntactic features in the middle and semantic features at the top. BERT turns out to…

Citation impact

1,212
total citations
FWCI
97.93
Percentile
100%
References
32
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Hierarchy
  • Natural language processing
  • Artificial intelligence
  • Representation (politics)
  • Dependency (UML)
  • Phrase
  • Language model
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.

Funding