🤖 AI Summary
Existing approaches to smart contract vulnerability detection struggle to simultaneously model control flow and data dependencies, often lacking deep semantic understanding, robustness, and interpretability. This work proposes the first multimodal graph neural network framework that integrates causal inference with Retrieval-Augmented Generation (RAG), jointly modeling control flow graphs, data flow graphs, and call graphs. By injecting security knowledge via large language models and employing a causal attention mechanism, the method effectively identifies genuine vulnerability signals. Evaluated on the primary benchmark, it achieves a Macro F1 score of 91.28%, outperforming the best existing method by 39.6 percentage points. It demonstrates strong robustness under adversarial attacks, with F1 dropping only 2.35% and attack success rates as low as 3%. Moreover, its interpretability, measured by MIoU, reaches 32.51%, significantly surpassing current solutions.
📝 Abstract
Although Graph Neural Networks (GNNs) have shown promise for smart contract vulnerability detection, they still face significant limitations. Homogeneous graph models fail to capture the interplay between control flow and data dependencies, while heterogeneous graph approaches often lack deep semantic understanding, leaving them susceptible to adversarial attacks. Moreover, most black-box models fail to provide explainable evidence, hindering trust in professional audits. To address these challenges, we propose ORACAL (Observable RAG-enhanced Analysis with CausAL reasoning), a heterogeneous multimodal graph learning framework that integrates Control Flow Graph (CFG), Data Flow Graph (DFG), and Call Graph (CG). ORACAL selectively enriches critical subgraphs with expert-level security context from Retrieval-Augmented Generation (RAG) and Large Language Models (LLMs), and employs a causal attention mechanism to disentangle true vulnerability indicators from spurious correlations. For transparency, the framework adopts PGExplainer to generate subgraph-level explanations identifying vulnerability triggering paths. Experiments on large-scale datasets demonstrate that ORACAL achieves state-of-the-art performance, outperforming MANDO-HGT, MTVHunter, GNN-SC, and SCVHunter by up to 39.6 percentage points, with a peak Macro F1 of 91.28% on the primary benchmark. ORACAL maintains strong generalization on out-of-distribution datasets with 91.8% on CGT Weakness and 77.1% on DAppScan. In explainability evaluation, PGExplainer achieves 32.51% Mean Intersection over Union (MIoU) against manually annotated vulnerability triggering paths. Under adversarial attacks, ORACAL limits performance degradation to approximately 2.35% F1 decrease with an Attack Success Rate (ASR) of only 3%, surpassing SCVHunter and MANDO-HGT which exhibit ASRs ranging from 10.91% to 18.73%.