How Far Are We? An Empirical Analysis of Current Vulnerability Localization Approaches

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Open-source software vulnerability patch detection faces challenges including poor scalability, low accuracy, and weak generalization. This paper proposes a two-stage vulnerability localization framework: (1) a version-history-driven candidate patch filtering stage that drastically reduces the search space; and (2) an LLM-based multi-turn dialogue voting mechanism that integrates semantic understanding with knowledge augmentation—eliminating reliance on complex model architectures. The method follows a “semantics-first, version-driven, knowledge-guided” principle, overcoming the timeliness limitations of web crawling. Evaluated on a dataset of 750 real-world vulnerabilities, our approach significantly outperforms state-of-the-art methods in both accuracy and efficiency. It establishes a novel paradigm for precise patch identification within large-scale commit histories.

Technology Category

Application Category

📝 Abstract

Open-source software vulnerability patch detection is a critical component for maintaining software security and ensuring software supply chain integrity. Traditional manual detection methods face significant scalability challenges when processing large volumes of commit histories, while being prone to human errors and omissions. Existing automated approaches, including heuristic-based methods and pre-trained model solutions, suffer from limited accuracy, poor generalization capabilities, and inherent methodological constraints that hinder their practical deployment. To address these fundamental challenges, this paper conducts a comprehensive empirical study of existing vulnerability patch detection methods, revealing four key insights that guide the design of effective solutions: the critical impact of search space reduction, the superiority of pre-trained semantic understanding over architectural complexity, the temporal limitations of web crawling approaches, and the advantages of knowledge-driven methods. Based on these insights, we propose a novel two-stage framework that combines version-driven candidate filtering with large language model-based multi-round dialogue voting to achieve accurate and efficient vulnerability patch identification. Extensive experiments on a dataset containing 750 real vulnerabilities demonstrate that our method outperforms current approaches.

Problem

Research questions and friction points this paper is trying to address.

Evaluating current automated vulnerability patch detection limitations

Addressing scalability and accuracy challenges in commit analysis

Proposing improved framework for software vulnerability identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage framework combining filtering and voting

Version-driven candidate filtering for search reduction

LLM-based multi-round dialogue for accurate identification

🔎 Similar Papers

No similar papers found.