VulGuard: An Unified Tool for Evaluating Just-In-Time Vulnerability Prediction Models

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address critical challenges in just-in-time vulnerability prediction (JIT-VP)—including poor reproducibility, labor-intensive data preprocessing, and inconsistent model evaluation—this paper introduces JIT-VP Toolkit, an open-source, automated framework. It unifies commit history mining, fine-grained code change feature extraction (spanning syntactic and semantic levels), multi-model integration (encompassing traditional machine learning and pre-trained models), and standardized evaluation protocols, with native support for CI/CD integration and benchmarking. The toolkit automates GitHub repository data collection, cleaning, labeling, and feature engineering, substantially lowering experimental barriers. Empirical validation on the FFmpeg and Linux kernel datasets demonstrates that JIT-VP Toolkit improves model reproducibility by over 50% and enhances cross-study comparability. To our knowledge, it is the first end-to-end, scalable, and reproducible infrastructure specifically designed for JIT-VP research.

Technology Category

Application Category

📝 Abstract
We present VulGuard, an automated tool designed to streamline the extraction, processing, and analysis of commits from GitHub repositories for Just-In-Time vulnerability prediction (JIT-VP) research. VulGuard automatically mines commit histories, extracts fine-grained code changes, commit messages, and software engineering metrics, and formats them for downstream analysis. In addition, it integrates several state-of-the-art vulnerability prediction models, allowing researchers to train, evaluate, and compare models with minimal setup. By supporting both repository-scale mining and model-level experimentation within a unified framework, VulGuard addresses key challenges in reproducibility and scalability in software security research. VulGuard can also be easily integrated into the CI/CD pipeline. We demonstrate the effectiveness of the tool in two influential open-source projects, FFmpeg and the Linux kernel, highlighting its potential to accelerate real-world JIT-VP research and promote standardized benchmarking. A demo video is available at: https://youtu.be/j96096-pxbs
Problem

Research questions and friction points this paper is trying to address.

Automates extraction and analysis of GitHub commits for vulnerability prediction
Integrates state-of-the-art models for training and comparison in JIT-VP research
Addresses reproducibility and scalability challenges in software security research
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automates GitHub commit extraction and processing
Integrates state-of-the-art vulnerability prediction models
Supports repository-scale mining and model experimentation
🔎 Similar Papers
No similar papers found.
D
Duong Nguyen
School of Communication and Information Technology, Hanoi University of Science and Technology, Hanoi, Vietnam
M
Manh Tran-Duc
School of Communication and Information Technology, Hanoi University of Science and Technology, Hanoi, Vietnam
Thanh Le-Cong
Thanh Le-Cong
School of Computing and Information Systems, The University of Melbourne
Software EngineeringMachine LearningAI4CodeProgram RepairProgram Analysis
T
Triet Huynh Minh Le
School of Computer and Mathematical Sciences, The University of Adelaide, Adelaide, Australia
M. Ali Babar
M. Ali Babar
Professor of Software Engineering, The University of Adelaide, Australia
Software Security & PrivacyBig Data Platforms & ArchitecturesEmpirical Software EngineeringSoftware Architecture
Q
Quyet-Thang Huynh
School of Communication and Information Technology, Hanoi University of Science and Technology, Hanoi, Vietnam