MatSciBERT: A materials domain language model for text mining and information extraction

Gupta, Tanishq; Zaki, Mohd; Krishnan, N. M. Anoop; Mausam, Mausam

doi:10.1038/s41524-022-00784-w

articlenpj Computational MaterialsMay 3, 2022GOLD OA

MatSciBERT: A materials domain language model for text mining and information extraction

TGTanishq Gupta MZMohd Zaki NMN. M. Anoop Krishnan MMMausam Mausam

Indian Institute of Technology Delhi

Indexed incrossrefdoaj

Abstract

Abstract A large amount of materials science knowledge is generated and stored as text published in peer-reviewed scientific literature. While recent developments in natural language processing, such as Bidirectional Encoder Representations from Transformers (BERT) models, provide promising information extraction tools, these models may yield suboptimal results when applied on materials domain since they are not trained in materials science specific notations and jargons. Here, we present a materials-aware language model, namely, MatSciBERT, trained on a large corpus of peer-reviewed materials science publications. We show that MatSciBERT outperforms SciBERT, a language model trained on science corpus, and…

Citation impact

297

total citations

FWCI: 18.91
Percentile: 100%
References: 53

Citations per year

Authors

4

Topics & keywords

Topics

Keywords

Computer science
Natural language processing
Information extraction
Relationship extraction
Transformer
Domain (mathematical analysis)
Artificial intelligence
Notation

UN Sustainable Development Goals

Quality Education

No related works found for this paper.