Expectation vs. Experience: Evaluating the Usability of Code Generation Tools Powered by Large Language Models

Harvard University Press · Purdue University West Lafayette · +1 more institution

Indexed incrossref

Abstract

Recent advances in Large Language Models (LLM) have made automatic code generation possible for real-world programming tasks in general-purpose programming languages such as Python. However, there are few human studies on the usability of these tools and how they fit the programming workflow. In this work, we conducted a within-subjects user study with 24 participants to understand how programmers use and perceive Copilot, a LLM-based code generation tool. We found that, while Copilot did not necessarily improve the task completion time or success rate, most participants preferred to use Copilot in daily programming tasks, since Copilot often provided a useful starting point and saved the effort of searching…

Citation impact

537
total citations
FWCI
73.28
Percentile
100%
References
30
Citations per year

Authors

3

Topics & keywords

Keywords
  • Computer science
  • Usability
  • Debugging
  • Workflow
  • Task (project management)
  • Python (programming language)
  • Software engineering
  • Human–computer interaction
No related works found for this paper.

Funding