MatSciBERT: A materials domain language model for text mining and information extraction
Indian Institute of Technology Delhi
Abstract
Abstract A large amount of materials science knowledge is generated and stored as text published in peer-reviewed scientific literature. While recent developments in natural language processing, such as Bidirectional Encoder Representations from Transformers (BERT) models, provide promising information extraction tools, these models may yield suboptimal results when applied on materials domain since they are not trained in materials science specific notations and jargons. Here, we present a materials-aware language model, namely, MatSciBERT, trained on a large corpus of peer-reviewed materials science publications. We show that MatSciBERT outperforms SciBERT, a language model trained on science corpus, and…
Citation impact
- FWCI
- 18.91
- Percentile
- 100%
- References
- 53
Authors
4Topics & keywords
- Computer science
- Natural language processing
- Information extraction
- Relationship extraction
- Transformer
- Domain (mathematical analysis)
- Artificial intelligence
- Notation
- Quality Education
Funding
- IBInternational Business Machines Corporation
- BLBloomberg L.P.
- GGoogle
- DODepartment of Science and Technology, Ministry of Science and Technology, IndiaAwards: ECR/2018/002228, DST/INSPIRE/04/2016/002774
- ISIndian Space Research Organisation
- MOMinistry of Education, India
- IIIndian Institute of Technology Delhi
- SAScience and Engineering Research BoardAward: ECR/2018/002228
- BOBoard of Research in Nuclear SciencesAward: 53/20/01/2021-BRNS