Toward Realistic Evaluations of Just-In-Time Vulnerability Prediction

๐Ÿ“… 2025-07-14
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing Just-In-Time Vulnerability Prediction (JIT-VP) evaluations are severely unrealistic, relying solely on vulnerability-introducing or -fixing commits while ignoring the vast majority of vulnerability-neutral commitsโ€”leading to substantial performance overestimation. Method: We propose a more realistic evaluation paradigm and introduce the first large-scale, publicly available dataset comprising over one million commits, explicitly mixing vulnerability-relevant and vulnerability-neutral instances. We systematically benchmark eight state-of-the-art JIT-VP models and address severe class imbalance via customized loss functions and diverse sampling strategies. Contribution/Results: Under realistic data distribution, the average PR-AUC of mainstream methods plummets from 0.805 to 0.016; none of the existing imbalance-mitigation techniques meaningfully alleviates this degradation. Our work exposes the fundamental failure of current JIT-VP approaches in practical settings and establishes a more credible, representative benchmark and critical reflection framework for future research.

Technology Category

Application Category

๐Ÿ“ Abstract
Modern software systems are increasingly complex, presenting significant challenges in quality assurance. Just-in-time vulnerability prediction (JIT-VP) is a proactive approach to identifying vulnerable commits and providing early warnings about potential security risks. However, we observe that current JIT-VP evaluations rely on an idealized setting, where the evaluation datasets are artificially balanced, consisting exclusively of vulnerability-introducing and vulnerability-fixing commits. To address this limitation, this study assesses the effectiveness of JIT-VP techniques under a more realistic setting that includes both vulnerability-related and vulnerability-neutral commits. To enable a reliable evaluation, we introduce a large-scale public dataset comprising over one million commits from FFmpeg and the Linux kernel. Our empirical analysis of eight state-of-the-art JIT-VP techniques reveals a significant decline in predictive performance when applied to real-world conditions; for example, the average PR-AUC on Linux drops 98% from 0.805 to 0.016. This discrepancy is mainly attributed to the severe class imbalance in real-world datasets, where vulnerability-introducing commits constitute only a small fraction of all commits. To mitigate this issue, we explore the effectiveness of widely adopted techniques for handling dataset imbalance, including customized loss functions, oversampling, and undersampling. Surprisingly, our experimental results indicate that these techniques are ineffective in addressing the imbalance problem in JIT-VP. These findings underscore the importance of realistic evaluations of JIT-VP and the need for domain-specific techniques to address data imbalance in such scenarios.
Problem

Research questions and friction points this paper is trying to address.

Evaluating JIT-VP in realistic unbalanced commit datasets
Assessing performance drop of JIT-VP in real-world conditions
Exploring ineffective imbalance-handling techniques for JIT-VP
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses realistic dataset with vulnerability-neutral commits
Introduces large-scale public dataset for evaluation
Explores imbalance-handling techniques for JIT-VP
๐Ÿ”Ž Similar Papers
No similar papers found.
D
Duong Nguyen
School of Communication and Information Technology, Hanoi University of Science and Technology, Hanoi, Vietnam
Thanh Le-Cong
Thanh Le-Cong
School of Computing and Information Systems, The University of Melbourne
Software EngineeringMachine LearningAI4CodeProgram RepairProgram Analysis
T
Triet Huynh Minh Le
School of Computer and Mathematical Sciences, The University of Adelaide, Adelaide, Australia
M. Ali Babar
M. Ali Babar
Professor of Software Engineering, The University of Adelaide, Australia
Software Security & PrivacyBig Data Platforms & ArchitecturesEmpirical Software EngineeringSoftware Architecture
Q
Quyet-Thang Huynh
School of Communication and Information Technology, Hanoi University of Science and Technology, Hanoi, Vietnam