A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists
Helmholtz Institute Jena · Friedrich Schiller University Jena · +9 more institutions
Abstract
Large language models (LLMs) have gained widespread interest owing to their ability to process human language and perform tasks on which they have not been explicitly trained. However, we possess only a limited systematic understanding of the chemical capabilities of LLMs, which would be required to improve models and mitigate potential harm. Here we introduce ChemBench, an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs against the expertise of chemists. We curated more than 2,700 question-answer pairs, evaluated leading open- and closed-source LLMs and found that the best models, on average, outperformed the best human chemists in our study. However,…
Citation impact
- FWCI
- 21.09
- Percentile
- 100%
- References
- 68
Authors
35- AMA.H. MirzaCorresponding
Helmholtz Institute Jena, Friedrich Schiller University Jena
- NANawaf Alampara
Friedrich Schiller University Jena
- SKSreekanth Kunchapu
Friedrich Schiller University Jena
- MRMartiño Ríos-García
Instituto Nacional del Carbón, Friedrich Schiller University Jena
- BEBenedict Emoekabu
Friedrich Schiller University Jena
Topics & keywords
- Benchmarking
- Harm
- Process (computing)
- Value (mathematics)
- Chemistry
- Management science
- Computer science
- Cognitive science
- Quality Education
Funding
- NSNational Science Foundation
- UDU.S. Department of State
- FJFriedrich-Schiller-Universität Jena
- GOGovernment of the United Kingdom
- URUK Research and Innovation
- MDMinisterio de Ciencia, Innovación y Universidades
- UFUS-UK Fulbright Commission
- ECEuropean Commission
- DFDeutsche ForschungsgemeinschaftAward: 497115849
- SNSchweizerischer Nationalfonds zur Förderung der Wissenschaftlichen ForschungAward: 225147
- OOOffice of Multicultural Interests Department of Local Government and Communities
- HAHelmholtz Association
- SUSlezská Univerzita v Opavě
- HEHORIZON EUROPE Framework Programme
- CSConsejo Superior de Investigaciones Científicas
- SOSoutheastern Ontario Academic Medical OrganizationAward: 101106377
- AEAgencia Estatal de InvestigaciónAward: CNS2022-135474