Explainable Vulnerability Detection in C/C++ Using Edge-Aware Graph Attention Networks

📅 2025-07-22

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address the class imbalance problem in C/C++ vulnerability detection—which leads to high false-positive rates and poor interpretability—this paper proposes an edge-aware graph attention model based on Code Property Graphs (CPGs). Methodologically: (i) we construct a CPG integrating syntactic, control-flow, and data-flow information; (ii) we design a dual-channel node embedding (structural + semantic) coupled with an edge-type-aware attention mechanism to enhance relational modeling; and (iii) we adopt a class-weighted cross-entropy loss to mitigate class imbalance and incorporate critical code region localization for improved model interpretability. Evaluated on the ReVeal dataset, our model achieves 88.25% accuracy and 48.23% F1-score—outperforming baseline methods by 4.6% and 16.9%, respectively—and significantly surpasses mainstream static analysis tools.

Technology Category

Application Category

📝 Abstract

Detecting security vulnerabilities in source code remains challenging, particularly due to class imbalance in real-world datasets where vulnerable functions are under-represented. Existing learning-based methods often optimise for recall, leading to high false positive rates and reduced usability in development workflows. Furthermore, many approaches lack explainability, limiting their integration into security workflows. This paper presents ExplainVulD, a graph-based framework for vulnerability detection in C/C++ code. The method constructs Code Property Graphs and represents nodes using dual-channel embeddings that capture both semantic and structural information. These are processed by an edge-aware attention mechanism that incorporates edge-type embeddings to distinguish among program relations. To address class imbalance, the model is trained using class-weighted cross-entropy loss. ExplainVulD achieves a mean accuracy of 88.25 percent and an F1 score of 48.23 percent across 30 independent runs on the ReVeal dataset. These results represent relative improvements of 4.6 percent in accuracy and 16.9 percent in F1 score compared to the ReVeal model, a prior learning-based method. The framework also outperforms static analysis tools, with relative gains of 14.0 to 14.1 percent in accuracy and 132.2 to 201.2 percent in F1 score. Beyond improved detection performance, ExplainVulD produces explainable outputs by identifying the most influential code regions within each function, supporting transparency and trust in security triage.

Problem

Research questions and friction points this paper is trying to address.

Detect vulnerabilities in C/C++ code with class imbalance

Reduce false positives in learning-based vulnerability detection

Provide explainable outputs for security workflow integration

Innovation

Methods, ideas, or system contributions that make the work stand out.

Edge-aware attention mechanism for program relations

Dual-channel embeddings capture semantic and structural info

Class-weighted cross-entropy loss addresses imbalance

🔎 Similar Papers

Towards Effectively Detecting and Explaining Vulnerabilities Using Large Language Models