Vulnerability-Affected Versions Identification: How Far Are We?

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the lack of empirical understanding regarding the practical effectiveness of vulnerability impact version identification tools. We conduct the first systematic, large-scale empirical evaluation: constructing a high-quality benchmark dataset comprising 1,128 real-world C/C++ vulnerabilities and rigorously evaluating 12 state-of-the-art tools. Results reveal that the highest accuracy achieved by any individual tool is only 44.9%; ensemble strategies yield at most a 10.1-percentage-point improvement, with overall performance remaining below 60%. The primary bottlenecks are overreliance on heuristic rules and insufficient semantic reasoning capability. We identify critical issues—including patch-matching bias, root-cause distributions of false positives, and fundamental paradigm limitations—and propose concrete, actionable improvements. To foster reproducible research and advance the field, we publicly release both the benchmark dataset and the evaluation framework—establishing a foundational resource for next-generation vulnerability impact analysis.

Technology Category

Application Category

📝 Abstract
Identifying which software versions are affected by a vulnerability is critical for patching, risk mitigation.Despite a growing body of tools, their real-world effectiveness remains unclear due to narrow evaluation scopes often limited to early SZZ variants, outdated techniques, and small or coarse-graineddatasets. In this paper, we present the first comprehensive empirical study of vulnerability affected versions identification. We curate a high quality benchmark of 1,128 real-world C/C++ vulnerabilities and systematically evaluate 12 representative tools from both tracing and matching paradigms across four dimensions: effectiveness at both vulnerability and version levels, root causes of false positives and negatives, sensitivity to patch characteristics, and ensemble potential. Our findings reveal fundamental limitations: no tool exceeds 45.0% accuracy, with key challenges stemming from heuristic dependence, limited semantic reasoning, and rigid matching logic. Patch structures such as add-only and cross-file changes further hinder performance. Although ensemble strategies can improve results by up to 10.1%, overall accuracy remains below 60.0%, highlighting the need for fundamentally new approaches. Moreover, our study offers actionable insights to guide tool development, combination strategies, and future research in this critical area. Finally, we release the replicated code and benchmark on our website to encourage future contributions.outdated techniques, and small or coarse grained datasets.
Problem

Research questions and friction points this paper is trying to address.

Evaluating effectiveness of vulnerability-affected version identification tools
Assessing limitations of existing tools with real-world C/C++ vulnerabilities
Investigating root causes of false positives and negatives in identification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive empirical study of vulnerability identification
Evaluated 12 tools across four key dimensions
Revealed limitations with accuracy below 45%
🔎 Similar Papers
No similar papers found.
X
Xingchu Chen
Institute of Information Engineering, CAS; School of Cyber Security, UCAS, Beijing, China
Chengwei Liu
Chengwei Liu
Research Assistant Professor, Nanyang Technological University
Open Source SecuritySoftware Supply Chain SecurityProgram AnalysisSoftware Maintenance
Jialun Cao
Jialun Cao
The Hong Kong University of Science and Technology
SE for AIAI for SE
Y
Yang Xiao
Institute of Information Engineering, CAS; School of Cyber Security, UCAS, Beijing, China
X
Xinyue Cai
Institute of Information Engineering, CAS; School of Cyber Security, UCAS, Beijing, China
Yeting Li
Yeting Li
Institute of Information Engineering, Chinese Academy of Sciences
Software SecurityProgram AnalysisAutomata Theory
Jingyi Shi
Jingyi Shi
Unknown affiliation
software securitysoftware supply chain securityAI system security
T
Tianqi Sun
Institute of Information Engineering, CAS; School of Cyber Security, UCAS, Beijing, China
Haiming Chen
Haiming Chen
Institute of Software, UCAS, Beijing, China
Wei Huo
Wei Huo
Wireless Technology Lab, 2012, Huawei
Agentic AIMulti-agent systems