articleProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)Jan 1, 2022HYBRID OA
LinkBERT: Pretraining Language Models with Document Links
Indexed incrossref
Abstract
Language model (LM) pretraining captures various knowledge from text corpora, helping downstream NLP tasks. However, existing methods such as BERT model a single document, failing to capture document dependencies and knowledge that spans across documents. In this work, we propose LinkBERT, an effective LM pretraining method that incorporates document links, such as hyperlinks. Given a pretraining corpus, we view it as a graph of documents, and create LM inputs by placing linked documents in the same context. We then train the LM with two joint self-supervised tasks: masked language modeling and our newly proposed task, document relation prediction. We study LinkBERT in two domains: general domain (pretrained…
Citation impact
291
total citations
- FWCI
- 28.25
- Percentile
- 100%
- References
- 78
Citations per year
Authors
3Topics & keywords
Topics
Keywords
- Computer science
- Hyperlink
- Language model
- Natural language processing
- Artificial intelligence
- Information retrieval
- Context (archaeology)
- Graph
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.
Funding
- NSNational Science FoundationAwards: 1835598, 1552635, CAREER, 1918940, IIS-2030477
- MRMicrosoft Research
- UGUnitedHealth Group
- NINational Institutes of HealthAward: R56LM013365
- DADefense Advanced Research Projects AgencyAwards: HR00112190039, N660011924033
- MUMultidisciplinary University Research Initiative
- WTWu Tsai Neurosciences Institute, Stanford University
- ARArmy Research OfficeAwards: W911NF-16-1, W911NF-16-1-0342, W911NF-16-1-, W911NF