Structured information extraction from scientific text with large language models

Dagdelen, John; Dunn, Alexander; Lee, Sang‐Hoon; Walker, Nicholas; Rosen, Andrew; Ceder, Gerbrand; Persson, Kristin A.; Jain, Anubhav

doi:10.1038/s41467-024-45563-x

articleNature CommunicationsFeb 15, 2024GOLD OA

Structured information extraction from scientific text with large language models

JDJohn Dagdelen ADAlexander Dunn SLSang‐Hoon Lee NWNicholas Walker ARAndrew Rosen

Lawrence Berkeley National Laboratory · University of California, Berkeley

PubMed

Indexed incrossrefdoajpubmed

Abstract

Extracting structured knowledge from scientific text remains a challenging task for machine learning models. Here, we present a simple approach to joint named entity recognition and relation extraction and demonstrate how pretrained large language models (GPT-3, Llama-2) can be fine-tuned to extract useful records of complex scientific knowledge. We test three representative tasks in materials chemistry: linking dopants and host materials, cataloging metal-organic frameworks, and general composition/phase/morphology/application information extraction. Records are extracted from single sentences or entire paragraphs, and the output can be returned as simple English sentences or a more structured format such as…

Citation impact

547

total citations

FWCI: 59.03
Percentile: 100%
References: 68

Citations per year

Authors

8

Topics & keywords

Topics

Keywords

Computer science
Relationship extraction
Information extraction
Task (project management)
Information retrieval
JSON
Natural language processing
Simple (philosophy)

UN Sustainable Development Goals

Quality Education

No related works found for this paper.