preprintArXiv.orgJan 1, 2025GREEN OA

Can Open Large Language Models Catch Vulnerabilities?

Indexed inarxivdatacite

Abstract

As Large Language Models (LLMs) become increasingly integrated into secure software development workflows, a critical question remains unanswered: can these models not only detect insecure code but also reliably classify vulnerabilities according to standardized taxonomies? In this work, we conduct a systematic evaluation of three state-of-the-art LLMs - Llama3, Codestral, and Deepseek R1 - using a carefully filtered subset of the Big-Vul dataset annotated with eight representative Common Weakness Enumeration categories. Adopting a closed-world classification setup, we assess each model’s performance in both identifying the presence of vulnerabilities and mapping them to the correct CWE label. Our findings…

Citation impact

484
total citations
FWCI
1223.88
Percentile
100%
References
0
Citations per year

Authors

1

Topics & keywords

Keywords
  • Reinforcement learning
  • Reinforcement
  • Computer science
  • Cognitive science
  • Artificial intelligence
  • Psychology
  • Social psychology
No related works found for this paper.