CAMPA: Efficient and Aligned Multimodal Graph Learning via Decoupled Propagation and Aggregation

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing decoupled multimodal graph neural networks are susceptible to modality conflicts during message propagation and aggregation, leading to cross-modal semantic inconsistency and misaligned feature trajectories. This work proposes CAMPA, a novel framework that systematically uncovers this issue for the first time and introduces a parameter-free, two-stage alignment mechanism. Specifically, during propagation, cross-modal similarity priors guide message passing, while at the aggregation stage, trajectory-level self-attention and cross-attention align multi-hop cross-modal dependencies. Extensive experiments demonstrate that CAMPA significantly outperforms both coupled and decoupled baselines across multiple benchmark tasks, while preserving the efficiency and scalability inherent to decoupled architectures.

📝 Abstract

Multimodal Graph Neural Networks (MGNNs) have shown strong potential for learning from multimodal attributed graphs, yet most existing approaches rely on tightly coupled architectures that suffer from prohibitive computational overhead. In this paper, we present a systematic empirical analysis showing that decoupled MGNNs are substantially more efficient and scalable for large-scale graph learning. However, we identify a critical bottleneck in existing decoupled pipelines, namely modal conflict, which arises in both the propagation and aggregation stages. Specifically, independent multi-hop diffusion causes cross-modal semantic divergence during propagation, while naive fusion fails to align multi-hop feature trajectories during aggregation, jointly limiting effective representation learning. To address this challenge, we propose CAMPA, a Cross-modal Aligned Multimodal Propagation & Aggregation framework for decoupled multimodal graph learning. Concretely, CAMPA introduces a two-stage alignment mechanism: (1) cross-modal aligned propagation, which injects cross-modal similarity priors into message passing to preserve semantic consistency without additional parameter overhead; (2) trajectory aligned aggregation, which leverages trajectory-level self-attention and cross-attention to capture and align long-range dependencies across modalities and hops. Extensive experiments on diverse benchmark datasets and tasks demonstrate that CAMPA consistently outperforms strong coupled and decoupled baselines while preserving the efficiency advantages of the decoupled paradigm.

Problem

Research questions and friction points this paper is trying to address.

modal conflict

multimodal graph learning

decoupled propagation

feature alignment

semantic divergence

Innovation

Methods, ideas, or system contributions that make the work stand out.

decoupled multimodal graph learning

cross-modal alignment

trajectory-aligned aggregation