Focus on What Matters: Fisher-Guided Adaptive Multimodal Fusion for Vulnerability Detection

📅 2026-01-05

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

This work addresses the challenges of redundancy between code sequences and graph representations in multimodal vulnerability detection, as well as the performance degradation caused by noisy or low-quality graph modalities that dilute discriminative signals from dominant modalities. To this end, the authors propose TaCCS-DFA, a novel framework that introduces Fisher information into multimodal fusion for the first time. By online estimating a task-sensitive, low-rank principal Fisher subspace, the method constrains cross-modal attention directions and incorporates an adaptive gating mechanism to dynamically modulate the contribution of the graph modality, thereby suppressing noise propagation. This yields a task-oriented, low-rank fusion strategy with a theoretically tighter risk bound than full-spectrum attention. Experiments demonstrate significant improvements over strong baselines on BigVul, Devign, and ReVeal; notably, when built upon CodeT5, it achieves an F1 score of 87.80% on BigVul—6.3 percentage points higher than Vul-LMGNNs—while maintaining low calibration error and computational overhead.

Technology Category

Application Category

📝 Abstract

Software vulnerability detection can be formulated as a binary classification problem that determines whether a given code snippet contains security defects. Existing multimodal methods typically fuse Natural Code Sequence (NCS) representations extracted by pretrained models with Code Property Graph (CPG) representations extracted by graph neural networks, under the implicit assumption that introducing an additional modality necessarily yields information gain. Through empirical analysis, we demonstrate the limitations of this assumption: pretrained models already encode substantial structural information implicitly, leading to strong overlap between the two modalities; moreover, graph encoders are generally less effective than pretrained language models in feature extraction. As a result, naive fusion not only struggles to obtain complementary signals but can also dilute effective discriminative cues due to noise propagation. To address these challenges, we propose a task-conditioned complementary fusion strategy that uses Fisher information to quantify task relevance, transforming cross-modal interaction from full-spectrum matching into selective fusion within a task-sensitive subspace. Our theoretical analysis shows that, under an isotropic perturbation assumption, this strategy significantly tightens the upper bound on the output error. Based on this insight, we design the TaCCS-DFA framework, which combines online low-rank Fisher subspace estimation with an adaptive gating mechanism to enable efficient task-oriented fusion. Experiments on the BigVul, Devign, and ReVeal benchmarks demonstrate that TaCCS-DFA delivers up to a 6.3-point gain in F1 score with only a 3.4% increase in inference latency, while maintaining low calibration error.

Problem

Research questions and friction points this paper is trying to address.

vulnerability detection

multimodal fusion

redundancy

modality quality

discriminative signal

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fisher information

adaptive multimodal fusion

vulnerability detection