CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models

Yu, Hao; Bo, Shen; Ran, Dezhi; Zhang, J. Y.; Zhang, Qi Rong; Ma, Yuchi; Liang, Guangtai; Li, Ying; Wang, Qianxiang; Xie, Tao

doi:10.1145/3597503.3623316

articleFeb 6, 2024Closed access

CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models

HYHao Yu SBShen Bo DRDezhi Ran JYJ. Y. Zhang QRQi Rong Zhang

Peking University · Huawei Technologies (China)

Indexed incrossref

Abstract

Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. To evaluate the effectiveness of these models, multiple existing benchmarks (e.g., HumanEval and AiXBench) are proposed, including only cases of generating a standalone function, i.e., a function that may invoke or access only built-in functions and standard libraries. However, non-standalone functions, which typically are not included in the existing benchmarks, constitute more than 70% of the functions in popular open-source projects, and evaluating models' effectiveness on standalone…

Citation impact

107

total citations

FWCI: 74.36
Percentile: 100%
References: 7

Citations per year

Authors

10

Topics & keywords

Topics

Keywords

Computer science
Benchmark (surveying)
Code generation
Code (set theory)
Generative grammar
Source code
Function (biology)
Software engineering

UN Sustainable Development Goals

Industry, innovation and infrastructure

No related works found for this paper.