CoderEval: A Benchmark of Pragmatic Code Generation with Generative Pre-trained Models
Peking University · Huawei Technologies (China)
Abstract
Code generation models based on the pre-training and fine-tuning paradigm have been increasingly attempted by both academia and industry, resulting in well-known industrial models such as Codex, CodeGen, and PanGu-Coder. To evaluate the effectiveness of these models, multiple existing benchmarks (e.g., HumanEval and AiXBench) are proposed, including only cases of generating a standalone function, i.e., a function that may invoke or access only built-in functions and standard libraries. However, non-standalone functions, which typically are not included in the existing benchmarks, constitute more than 70% of the functions in popular open-source projects, and evaluating models' effectiveness on standalone…
Citation impact
- FWCI
- 74.36
- Percentile
- 100%
- References
- 7
Authors
10Topics & keywords
- Computer science
- Benchmark (surveying)
- Code generation
- Code (set theory)
- Generative grammar
- Source code
- Function (biology)
- Software engineering
- Industry, innovation and infrastructure