Finetuning Large Language Models for Vulnerability Detection

Shestov, Aleksei; Levichev, Rodion; Mussabayev, Ravil; Maslov, Evgeny; Zadorozhny, Pavel; Cheshkov, Anton; Mussabayev, Rustam; Toleu, Alymzhan; Tolegen, Gulmira; Krassovitskiy, Alexander

doi:10.1109/access.2025.3546700

articleIEEE AccessJan 1, 2025GOLD OA

Finetuning Large Language Models for Vulnerability Detection

ASAleksei Shestov RLRodion Levichev RMRavil Mussabayev EMEvgeny MaslovPZPavel Zadorozhny

Research Institute of Semiconductor Devices · Satbayev University · +1 more institution

Indexed incrossrefdoaj

Abstract

This paper presents the results of finetuning large language models (LLMs) for the task of detecting vulnerabilities in Java source code. We leverage WizardCoder, a recent improvement of the state-of-the-art LLM StarCoder, and adapt it for vulnerability detection through further finetuning. To accelerate training, we modify WizardCoder’s training procedure, also we investigate optimal training regimes. For the imbalanced dataset with many more negative examples than positive, we also explore different techniques to improve classification performance. The finetuned WizardCoder model achieves improvement in ROC AUC and F1 measures on balanced and imbalanced vulnerability datasets over CodeBERT-like model,…

Citation impact

61

total citations

FWCI: 149.75
Percentile: 100%
References: 20

Citations per year

Authors

10

AS
Aleksei ShestovCorresponding
RL
Rodion Levichev
Research Institute of Semiconductor Devices
RM
Ravil Mussabayev
Satbayev University
EM
Evgeny Maslov
Institute of Experimental Cardiology
PZ
Pavel Zadorozhny
Research Institute of Semiconductor Devices

Topics & keywords

Topics

Keywords

Computer science
Vulnerability (computing)
Computer security

No related works found for this paper.

Funding

MO
Ministry of Education and Science of the Republic of Kazakhstan
Award: BR21882268