articleFeb 6, 2024Closed access

CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models

Peking University · Huawei Technologies (China)

Indexed incrossref

Abstract

Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. To evaluate the effectiveness of these models, multiple existing benchmarks (e.g., HumanEval and AiXBench) are proposed, including only cases of generating a standalone function, i.e., a function that may invoke or access only built-in functions and standard libraries. However, non-standalone functions, which typically are not included in the existing benchmarks, constitute more than 70% of the functions in popular open-source projects, and evaluating models' effectiveness on standalone…

Citation impact

107
total citations
FWCI
74.36
Percentile
100%
References
7
Citations per year

Authors

10

Topics & keywords

Keywords
  • Computer science
  • Benchmark (surveying)
  • Code generation
  • Code (set theory)
  • Generative grammar
  • Source code
  • Function (biology)
  • Software engineering
UN Sustainable Development Goals
  • Industry, innovation and infrastructure
No related works found for this paper.

Funding