Finetuning Large Language Models for Vulnerability Detection
Research Institute of Semiconductor Devices · Satbayev University · +1 more institution
Abstract
This paper presents the results of finetuning large language models (LLMs) for the task of detecting vulnerabilities in Java source code. We leverage WizardCoder, a recent improvement of the state-of-the-art LLM StarCoder, and adapt it for vulnerability detection through further finetuning. To accelerate training, we modify WizardCoder’s training procedure, also we investigate optimal training regimes. For the imbalanced dataset with many more negative examples than positive, we also explore different techniques to improve classification performance. The finetuned WizardCoder model achieves improvement in ROC AUC and F1 measures on balanced and imbalanced vulnerability datasets over CodeBERT-like model,…
Citation impact
- FWCI
- 149.75
- Percentile
- 100%
- References
- 20
Authors
10- ASAleksei ShestovCorresponding
- RLRodion Levichev
Research Institute of Semiconductor Devices
- RMRavil Mussabayev
Satbayev University
- EMEvgeny Maslov
Institute of Experimental Cardiology
- PZPavel Zadorozhny
Research Institute of Semiconductor Devices
Topics & keywords
- Computer science
- Vulnerability (computing)
- Computer security