GLM-130B: An Open Bilingual Pre-trained Model
Indexed inarxivdatacite
Abstract
We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous unexpected technical and engineering challenges, particularly on loss spikes and divergence. In this paper, we introduce the training process of GLM-130B including its design choices, training strategies for both efficiency and stability, and engineering efforts. The resultant GLM-130B model offers significant outperformance over GPT-3 175B (davinci) on a wide range of popular English benchmarks…
Citation impact
296
total citations
- FWCI
- —
- Percentile
- —
- References
- 0
Citations per year
Authors
18Topics & keywords
Topics
Keywords
- Computer science
- Leverage (statistics)
- Inference
- Security token
- Fluency
- Language model
- Artificial intelligence
- Natural language processing
UN Sustainable Development Goals
- Quality Education
No related works found for this paper.