The article discusses the challenges posed by vulnerabilities in Open-Source Software (OSS) and the difficulties in automated vulnerability detection. The authors propose an automated data collection framework to construct a high-quality vulnerability dataset named ReposVul. This framework includes a vulnerability untangling module, a multi-granularity dependency extraction module, and a trace-based filtering module. The resulting dataset, ReposVul, includes over 6,000 CVE entries across more than 1,000 projects and four programming languages, offering a more effective tool for detecting software vulnerabilities.
Publication date: 26 Jan 2024
Project Page: Not provided
Paper: https://arxiv.org/pdf/2401.13169