articleOct 3, 2023GOLD OA
DiverseVul: A New Vulnerable Source Code Dataset for Deep Learning Based Vulnerability Detection
University of California, Berkeley · King Abdulaziz City for Science and Technology · +1 more institution
Indexed incrossref
Abstract
We propose and release a new vulnerable source code dataset. We curate the dataset by crawling security issue websites, extracting vulnerability-fixing commits and source codes from the corresponding projects. Our new dataset contains 18,945 vulnerable functions spanning 150 CWEs and 330,492 non-vulnerable functions extracted from 7,514 commits. Our dataset covers 295 more projects than all previous datasets combined.
Citation impact
193
total citations
- FWCI
- 84.42
- Percentile
- 100%
- References
- 17
Citations per year
Authors
5Topics & keywords
Topics
Keywords
- Crawling
- Computer science
- Source code
- Vulnerability (computing)
- Code (set theory)
- Data mining
- Artificial intelligence
- Computer security
No related works found for this paper.