Can Open Large Language Models Catch Vulnerabilities?

DeepSeek-AI

doi:10.4230/oasics.icpec.2025.4

preprintArXiv.orgJan 1, 2025GREEN OA

Can Open Large Language Models Catch Vulnerabilities?

DDeepSeek-AI

Indexed inarxivdatacite

Abstract

As Large Language Models (LLMs) become increasingly integrated into secure software development workflows, a critical question remains unanswered: can these models not only detect insecure code but also reliably classify vulnerabilities according to standardized taxonomies? In this work, we conduct a systematic evaluation of three state-of-the-art LLMs - Llama3, Codestral, and Deepseek R1 - using a carefully filtered subset of the Big-Vul dataset annotated with eight representative Common Weakness Enumeration categories. Adopting a closed-world classification setup, we assess each model’s performance in both identifying the presence of vulnerabilities and mapping them to the correct CWE label. Our findings…

Citation impact

484

total citations

FWCI: 1223.88
Percentile: 100%
References: 0

Citations per year

Authors

1

D
DeepSeek-AICorresponding

Topics & keywords

Topics

Keywords

Reinforcement learning
Reinforcement
Computer science
Cognitive science
Artificial intelligence
Psychology
Social psychology

No related works found for this paper.