articleOct 3, 2023GOLD OA

DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection

University of California, Berkeley · King Abdulaziz City for Science and Technology · +1 more institution

Indexed incrossref

Abstract

We propose and release a new vulnerable source code dataset. We curate the dataset by crawling security issue websites, extracting vulnerability-fixing commits and source codes from the corresponding projects. Our new dataset contains 18,945 vulnerable functions spanning 150 CWEs and 330,492 non-vulnerable functions extracted from 7,514 commits. Our dataset covers 295 more projects than all previous datasets combined.

Citation impact

193
total citations
FWCI
84.42
Percentile
100%
References
17
Citations per year

Authors

5

Topics & keywords

Keywords
  • Crawling
  • Computer science
  • Source code
  • Vulnerability (computing)
  • Code (set theory)
  • Data mining
  • Artificial intelligence
  • Computer security
No related works found for this paper.

Funding