GLM-130B: An Open Bilingual Pre-trained Model

Zeng, Aohan; Liu, Xiao; Du, Zhengxiao; Wang, Zihan; Lai, Hanyu; Ding, Ming; Yang, Zhuoyi; Xu, Yifan; Zheng, Wendi; Xia, Xiao; Tam, Weng Lam; Ma, Zixuan; Xue, Yufei; Zhai, Jidong; Chen, Wenguang; Zhang, Peng; Dong, Yuxiao; Tang, Jie

doi:10.48550/arxiv.2210.02414

preprintarXiv (Cornell University)Oct 5, 2022GREEN OA

GLM-130B: An Open Bilingual Pre-trained Model

AZAohan Zeng XLXiao Liu ZDZhengxiao Du ZWZihan Wang HLHanyu Lai

Indexed inarxivdatacite

Abstract

We introduce GLM-130B, a bilingual (English and Chinese) pre-trained language model with 130 billion parameters. It is an attempt to open-source a 100B-scale model at least as good as GPT-3 (davinci) and unveil how models of such a scale can be successfully pre-trained. Over the course of this effort, we face numerous unexpected technical and engineering challenges, particularly on loss spikes and divergence. In this paper, we introduce the training process of GLM-130B including its design choices, training strategies for both efficiency and stability, and engineering efforts. The resultant GLM-130B model offers significant outperformance over GPT-3 175B (davinci) on a wide range of popular English benchmarks…

Citation impact

296

total citations

FWCI: —
Percentile: —
References: 0

Citations per year

Authors

18

Topics & keywords

Topics

Keywords

Computer science
Leverage (statistics)
Inference
Security token
Fluency
Language model
Artificial intelligence
Natural language processing

UN Sustainable Development Goals

Quality Education

No related works found for this paper.