Focus on What Matters: Fisher-Guided Adaptive Multimodal Fusion for Vulnerability Detection

๐Ÿ“… 2026-01-05
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the challenges of redundancy between code sequences and graph representations in multimodal vulnerability detection, as well as the performance degradation caused by noisy or low-quality graph modalities that dilute discriminative signals from dominant modalities. To this end, the authors propose TaCCS-DFA, a novel framework that introduces Fisher information into multimodal fusion for the first time. By online estimating a task-sensitive, low-rank principal Fisher subspace, the method constrains cross-modal attention directions and incorporates an adaptive gating mechanism to dynamically modulate the contribution of the graph modality, thereby suppressing noise propagation. This yields a task-oriented, low-rank fusion strategy with a theoretically tighter risk bound than full-spectrum attention. Experiments demonstrate significant improvements over strong baselines on BigVul, Devign, and ReVeal; notably, when built upon CodeT5, it achieves an F1 score of 87.80% on BigVulโ€”6.3 percentage points higher than Vul-LMGNNsโ€”while maintaining low calibration error and computational overhead.

Technology Category

Application Category

๐Ÿ“ Abstract
Software vulnerability detection can be formulated as a binary classification problem that determines whether a given code snippet contains security defects. Existing multimodal methods typically fuse Natural Code Sequence (NCS) representations extracted by pretrained models with Code Property Graph (CPG) representations extracted by graph neural networks, under the implicit assumption that introducing an additional modality necessarily yields information gain. Through empirical analysis, we demonstrate the limitations of this assumption: pretrained models already encode substantial structural information implicitly, leading to strong overlap between the two modalities; moreover, graph encoders are generally less effective than pretrained language models in feature extraction. As a result, naive fusion not only struggles to obtain complementary signals but can also dilute effective discriminative cues due to noise propagation. To address these challenges, we propose a task-conditioned complementary fusion strategy that uses Fisher information to quantify task relevance, transforming cross-modal interaction from full-spectrum matching into selective fusion within a task-sensitive subspace. Our theoretical analysis shows that, under an isotropic perturbation assumption, this strategy significantly tightens the upper bound on the output error. Based on this insight, we design the TaCCS-DFA framework, which combines online low-rank Fisher subspace estimation with an adaptive gating mechanism to enable efficient task-oriented fusion. Experiments on the BigVul, Devign, and ReVeal benchmarks demonstrate that TaCCS-DFA delivers up to a 6.3-point gain in F1 score with only a 3.4% increase in inference latency, while maintaining low calibration error.
Problem

Research questions and friction points this paper is trying to address.

vulnerability detection
multimodal fusion
redundancy
modality quality
discriminative signal
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fisher information
adaptive multimodal fusion
vulnerability detection
task-oriented fusion
low-rank subspace
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yun Bian
Chengdu Institute of Computer Applications, Chinese Academy of Sciences, China
Yi Chen
Yi Chen
Institute of Automation, Chinese Academy of Sciences
Character RecognitionAi4ScienceLarge Language Models
H
HaiQuan Wang
Chengdu Institute of Computer Applications, Chinese Academy of Sciences, China
S
ShiHao Li
Chengdu Institute of Computer Applications, Chinese Academy of Sciences, China
Zhe Cui
Zhe Cui
Beijing University of Posts and Telecommunications
fingerprint