Can Open Large Language Models Catch Vulnerabilities?
Indexed inarxivdatacite
Abstract
As Large Language Models (LLMs) become increasingly integrated into secure software development workflows, a critical question remains unanswered: can these models not only detect insecure code but also reliably classify vulnerabilities according to standardized taxonomies? In this work, we conduct a systematic evaluation of three state-of-the-art LLMs - Llama3, Codestral, and Deepseek R1 - using a carefully filtered subset of the Big-Vul dataset annotated with eight representative Common Weakness Enumeration categories. Adopting a closed-world classification setup, we assess each model’s performance in both identifying the presence of vulnerabilities and mapping them to the correct CWE label. Our findings…
Citation impact
484
total citations
- FWCI
- 1223.88
- Percentile
- 100%
- References
- 0
Citations per year
Authors
1Topics & keywords
Topics
Keywords
- Reinforcement learning
- Reinforcement
- Computer science
- Cognitive science
- Artificial intelligence
- Psychology
- Social psychology
No related works found for this paper.