PatchSeeker: Mapping NVD Records to their Vulnerability-fixing Commits with LLM Generated Commits and Embeddings

📅 2025-09-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of automatically and precisely mapping National Vulnerability Database (NVD) entries to their corresponding vulnerability-fixing commits (VFCs). To mitigate cross-modal matching bias caused by semantic sparsity in commit messages—a limitation of conventional approaches—we propose a large language model (LLM)-based semantic enhancement framework. Our method leverages LLMs to generate vulnerability-enriched commit summaries, serving as a semantic bridge between natural-language NVD descriptions and code changes, and integrates text embeddings for fine-grained semantic alignment. On standard benchmarks, our approach improves Mean Reciprocal Rank (MRR) by 59.3% and Recall@10 by 27.9% over the state-of-the-art Prospector. It also demonstrates strong generalization on recent CVE data. The core contribution is the first systematic integration of LLM-driven semantic summarization into the VFC–NVD alignment task, significantly enhancing both accuracy and robustness in cross-modal vulnerability localization.

Technology Category

Application Category

📝 Abstract
Software vulnerabilities pose serious risks to modern software ecosystems. While the National Vulnerability Database (NVD) is the authoritative source for cataloging these vulnerabilities, it often lacks explicit links to the corresponding Vulnerability-Fixing Commits (VFCs). VFCs encode precise code changes, enabling vulnerability localization, patch analysis, and dataset construction. Automatically mapping NVD records to their true VFCs is therefore critical. Existing approaches have limitations as they rely on sparse, often noisy commit messages and fail to capture the deep semantics in the vulnerability descriptions. To address this gap, we introduce PatchSeeker, a novel method that leverages large language models to create rich semantic links between vulnerability descriptions and their VFCs. PatchSeeker generates embeddings from NVD descriptions and enhances commit messages by synthesizing detailed summaries for those that are short or uninformative. These generated messages act as a semantic bridge, effectively closing the information gap between natural language reports and low-level code changes. Our approach PatchSeeker achieves 59.3% higher MRR and 27.9% higher Recall@10 than the best-performing baseline, Prospector, on the benchmark dataset. The extended evaluation on recent CVEs further confirms PatchSeeker's effectiveness. Ablation study shows that both the commit message generation method and the selection of backbone LLMs make a positive contribution to PatchSeeker. We also discuss limitations and open challenges to guide future work.
Problem

Research questions and friction points this paper is trying to address.

Mapping NVD records to vulnerability-fixing commits automatically
Bridging information gap between vulnerability descriptions and code changes
Overcoming limitations of sparse noisy commit messages
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated commit messages bridge semantic gap
Embeddings from NVD descriptions enhance semantic matching
Synthesized summaries improve information-rich commit representations
🔎 Similar Papers
No similar papers found.
H
Huu Hung Nguyen
Singapore Management University
A
Anh Tuan Nguyen
Hanoi University of Science and Technology
Thanh Le-Cong
Thanh Le-Cong
School of Computing and Information Systems, The University of Melbourne
Software EngineeringMachine LearningAI4CodeProgram RepairProgram Analysis
Yikun Li
Yikun Li
Postdoctoral Researcher
Artificial intelligenceSoftware EngineeringCyber Security
H
Han Wei Ang
GovTech
Y
Yide Yin
GovTech
Frank Liauw
Frank Liauw
Lead Cybersecurity Engineer, Government Technology Agency Singapore
S
Shar Lwin Khin
Singapore Management University
O
Ouh Eng Lieh
Singapore Management University
T
Ting Zhang
Monash University
D
David Lo
Singapore Management University