preprintarXiv (Cornell University)Nov 16, 2022GREEN OA

Galactica: A Large Language Model for Science

Indexed inarxivdatacite

Abstract

Information overload is a major obstacle to scientific progress. The explosive growth in scientific literature and data has made it ever harder to discover useful insights in a large mass of information. Today scientific knowledge is accessed through search engines, but they are unable to organize scientific knowledge alone. In this paper we introduce Galactica: a large language model that can store, combine and reason about scientific knowledge. We train on a large scientific corpus of papers, reference material, knowledge bases and many other sources. We outperform existing models on a range of scientific tasks. On technical knowledge probes such as LaTeX equations, Galactica outperforms the latest GPT-3 by…

Citation impact

259
total citations
FWCI
Percentile
References
0
Citations per year

Authors

9

Topics & keywords

Keywords
  • Computer science
  • Obstacle
  • Sociology of scientific knowledge
  • Language model
  • Data science
  • Artificial intelligence
  • Sociology
  • Social science
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.