Evaluating Large Language Models in Class-Level Code Generation

Du, Xueying; Liu, Mingwei; Wang, Kaixin; Wang, Hanlin; Jun-wei, Liu; Chen, Yixuan; Feng, Jiayi; Sha, Chaofeng; Peng, Xin; Lou, Yiling

doi:10.1145/3597503.3639219

articleApr 12, 2024Closed access

Evaluating Large Language Models in Class-Level Code Generation

XDXueying Du MLMingwei Liu KWKaixin Wang HWHanlin Wang LJLiu Jun-wei

Fudan University

Indexed incrossref

Abstract

Recently, many large language models (LLMs) have been proposed, showing advanced proficiency in code generation. Meanwhile, many efforts have been dedicated to evaluating LLMs on code generation benchmarks such as HumanEval. Although being very helpful for comparing different LLMs, existing evaluation focuses on a simple code generation scenario (i.e., function-level or statement-level code generation), which mainly asks LLMs to generate one single code unit (e.g., a function or a statement) for the given natural language description. Such evaluation focuses on generating independent and often small-scale code units, thus leaving it unclear how LLMs perform in real-world software development scenarios.

Citation impact

118

total citations

FWCI: 36.98
Percentile: 100%
References: 53

Citations per year

Authors

10

Topics & keywords

Topics

Keywords

Statement (logic)
Computer science
Code (set theory)
Code generation
Function (biology)
Class (philosophy)
Natural language generation
Programming language

UN Sustainable Development Goals

Quality Education

No related works found for this paper.