VFArchē: A Dual-Mode Framework for Locating Vulnerable Functions in Open-Source Software

📅 2025-06-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Precise vulnerability function (VF) localization in open-source software remains challenging due to the absence of VF annotations in existing vulnerability databases, high noise and severe semantic gaps in traditional methods when patches are unavailable, and the fact that over 26% of VFs lie outside patched functions. Method: We propose VFArchē, a dual-mode framework unifying VF localization for both patched and unpatched scenarios: with patches, it jointly leverages call-chain reachability analysis and code-change mining; without patches, it combines vulnerability description semantics with cross-modal source-code similarity matching for fine-grained, source-level VF identification. Contribution/Results: VFArchē is the first to synergistically model call-graph analysis and multi-granularity semantic alignment, overcoming patch dependency and lexical mismatch bottlenecks. Experiments show it improves mean reciprocal rank (MRR) by 1.3–1.9× over state-of-the-art baselines, accurately localizes VFs in 43 out of 50 newly disclosed vulnerabilities, and reduces SCA false positives by 78%–89%.

Technology Category

Application Category

📝 Abstract
Software Composition Analysis (SCA) has become pivotal in addressing vulnerabilities inherent in software project dependencies. In particular, reachability analysis is increasingly used in Open-Source Software (OSS) projects to identify reachable vulnerabilities (e.g., CVEs) through call graphs, enabling a focus on exploitable risks. Performing reachability analysis typically requires the vulnerable function (VF) to track the call chains from downstream applications. However, such crucial information is usually unavailable in modern vulnerability databases like NVD. While directly extracting VF from modified functions in vulnerability patches is intuitive, patches are not always available. Moreover, our preliminary study shows that over 26% of VF do not exist in the modified functions. Meanwhile, simply ignoring patches to search vulnerable functions suffers from overwhelming noises and lexical gaps between descriptions and source code. Given that almost half of the vulnerabilities are equipped with patches, a holistic solution that handles both scenarios with and without patches is required. To meet real-world needs and automatically localize VF, we present VFArchē, a dual-mode approach designed for disclosed vulnerabilities, applicable in scenarios with or without available patch links. The experimental results of VFArchē on our constructed benchmark dataset demonstrate significant efficacy regarding three metrics, achieving 1.3x and 1.9x Mean Reciprocal Rank over the best baselines for Patch-present and Patch-absent modes, respectively. Moreover, VFArchē has proven its applicability in real-world scenarios by successfully locating VF for 43 out of 50 latest vulnerabilities with reasonable efforts and significantly reducing 78-89% false positives of SCA tools.
Problem

Research questions and friction points this paper is trying to address.

Locating vulnerable functions in OSS without patch information
Reducing false positives in Software Composition Analysis tools
Bridging lexical gaps between vulnerability descriptions and code
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-mode framework for locating vulnerable functions
Combines patch analysis and description-based search
Reduces false positives in software vulnerability detection
🔎 Similar Papers
No similar papers found.
Lyuye Zhang
Lyuye Zhang
Postdoc, Nanyang Technological University
Program AnalysisOpen sourceOpen source securitySoftware supply chainSoftware maintenace
J
Jian Zhang
School of Software, Beihang University, China
K
Kaixuan Li
College of Computing and Data Science, Nanyang Technological University, Singapore
C
Chong Wang
College of Computing and Data Science, Nanyang Technological University, Singapore
Chengwei Liu
Chengwei Liu
Research Assistant Professor, Nanyang Technological University
Open Source SecuritySoftware Supply Chain SecurityProgram AnalysisSoftware Maintenance
J
Jiahui Wu
College of Computing and Data Science, Nanyang Technological University, Singapore
Sen Chen
Sen Chen
Professor, Nankai University
Software SecurityVulnerabilityMalwareSoftware Supply Chain Security
Yaowen Zheng
Yaowen Zheng
Institute of Information Engineering, Chinese Academy of Sciences
System securityIoT Security
Y
Yang Liu
College of Computing and Data Science, Nanyang Technological University, Singapore