Holistic Evaluation of Language Models

Bommasani, Rishi; Liang, Percy; Lee, Tong

doi:10.1111/nyas.15007

articleAnnals of the New York Academy of SciencesMay 25, 2023BRONZE OA

Holistic Evaluation of Language Models

RBRishi Bommasani PLPercy Liang TLTong Lee

Stanley Foundation · Stanford University

PubMed

Indexed incrossrefpubmed

Abstract

Language models (LMs) like GPT-3, PaLM, and ChatGPT are the foundation for almost all major language technologies, but their capabilities, limitations, and risks are not well understood. We present Holistic Evaluation of Language Models (HELM) to improve the transparency of LMs. LMs can serve many purposes and their behavior should satisfy many desiderata. To navigate the vast space of potential scenarios and metrics, we taxonomize the space and select representative subsets. We evaluate models on 16 core scenarios and 7 metrics, exposing important trade-offs. We supplement our core evaluation with seven targeted evaluations to deeply analyze specific aspects (including world knowledge, reasoning,…

Citation impact

434

total citations

FWCI: 71.30
Percentile: 100%
References: 77

Citations per year

Authors

3

Topics & keywords

Topics

Keywords

Computer science

No related works found for this paper.