VFArchē: A Dual-Mode Framework for Locating Vulnerable Functions in Open-Source Software

📅 2025-06-22

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Precise vulnerability function (VF) localization in open-source software remains challenging due to the absence of VF annotations in existing vulnerability databases, high noise and severe semantic gaps in traditional methods when patches are unavailable, and the fact that over 26% of VFs lie outside patched functions. Method: We propose VFArchē, a dual-mode framework unifying VF localization for both patched and unpatched scenarios: with patches, it jointly leverages call-chain reachability analysis and code-change mining; without patches, it combines vulnerability description semantics with cross-modal source-code similarity matching for fine-grained, source-level VF identification. Contribution/Results: VFArchē is the first to synergistically model call-graph analysis and multi-granularity semantic alignment, overcoming patch dependency and lexical mismatch bottlenecks. Experiments show it improves mean reciprocal rank (MRR) by 1.3–1.9× over state-of-the-art baselines, accurately localizes VFs in 43 out of 50 newly disclosed vulnerabilities, and reduces SCA false positives by 78%–89%.

Technology Category

Application Category

📝 Abstract

Software Composition Analysis (SCA) has become pivotal in addressing vulnerabilities inherent in software project dependencies. In particular, reachability analysis is increasingly used in Open-Source Software (OSS) projects to identify reachable vulnerabilities (e.g., CVEs) through call graphs, enabling a focus on exploitable risks. Performing reachability analysis typically requires the vulnerable function (VF) to track the call chains from downstream applications. However, such crucial information is usually unavailable in modern vulnerability databases like NVD. While directly extracting VF from modified functions in vulnerability patches is intuitive, patches are not always available. Moreover, our preliminary study shows that over 26% of VF do not exist in the modified functions. Meanwhile, simply ignoring patches to search vulnerable functions suffers from overwhelming noises and lexical gaps between descriptions and source code. Given that almost half of the vulnerabilities are equipped with patches, a holistic solution that handles both scenarios with and without patches is required. To meet real-world needs and automatically localize VF, we present VFArchē, a dual-mode approach designed for disclosed vulnerabilities, applicable in scenarios with or without available patch links. The experimental results of VFArchē on our constructed benchmark dataset demonstrate significant efficacy regarding three metrics, achieving 1.3x and 1.9x Mean Reciprocal Rank over the best baselines for Patch-present and Patch-absent modes, respectively. Moreover, VFArchē has proven its applicability in real-world scenarios by successfully locating VF for 43 out of 50 latest vulnerabilities with reasonable efforts and significantly reducing 78-89% false positives of SCA tools.

Problem

Research questions and friction points this paper is trying to address.

Locating vulnerable functions in OSS without patch information

Reducing false positives in Software Composition Analysis tools

Bridging lexical gaps between vulnerability descriptions and code

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-mode framework for locating vulnerable functions

Combines patch analysis and description-based search

Reduces false positives in software vulnerability detection

🔎 Similar Papers

No similar papers found.