preprintarXiv (Cornell University)Oct 5, 2022GREEN OA

GLM-130B: An Open Bilingual Pre-trained Model

Indexed inarxivdatacite

Abstract

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous unexpected technical and engineering challenges, particularly on loss spikes and divergence. In this paper, we introduce the training process of GLM-130B including its design choices, training strategies for both efficiency and stability, and engineering efforts. The resultant GLM-130B model offers significant outperformance over GPT-3 175B (davinci) on a wide range of popular English benchmarks…

Citation impact

296
total citations
FWCI
Percentile
References
0
Citations per year

Authors

18

Topics & keywords

Keywords
  • Computer science
  • Leverage (statistics)
  • Inference
  • Security token
  • Fluency
  • Language model
  • Artificial intelligence
  • Natural language processing
UN Sustainable Development Goals
  • Quality Education
No related works found for this paper.