TMTE: Effective Multimodal Graph Learning with Task-aware Modality and Topology Co-evolution

📅 2026-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Real-world multimodal attributed graphs are often hindered by topological imperfections such as noisy interactions, missing links, and task-irrelevant structures, limiting their adaptability to diverse downstream tasks. To address this challenge, this work proposes the TMTE framework, which for the first time explicitly models the bidirectional coupling between modalities and topology. TMTE jointly optimizes graph structure reconstruction and multimodal representation learning through a task-aware co-evolution mechanism. Specifically, it drives topological evolution via anchor-based multi-view metric learning and facilitates modality evolution through smoothness regularization and cross-modal alignment, establishing a closed-loop co-evolutionary process. Extensive experiments demonstrate that TMTE achieves state-of-the-art performance across six task categories on nine multimodal graph datasets and one non-graph multimodal dataset.
📝 Abstract
Multimodal-attributed graphs (MAGs) are a fundamental data structure for multimodal graph learning (MGL), enabling both graph-centric and modality-centric tasks. However, our empirical analysis reveals inherent topology quality limitations in real-world MAGs, including noisy interactions, missing connections, and task-agnostic relational structures. A single graph derived from generic relationships is therefore unlikely to be universally optimal for diverse downstream tasks. To address this challenge, we propose Task-aware Modality and Topology co-Evolution (TMTE), a novel MGL framework that jointly and iteratively optimizes graph topology and multimodal representations toward the target task. TMTE is motivated by the bidirectional coupling between modality and topology: multimodal attributes induce relational structures, while graph topology shapes modality representations. Concretely, TMTE casts topology evolution as multi-perspective metric learning over modality embeddings with an anchor-based approximation, and formulates modality evolution as smoothness-regularized fusion with cross-modal alignment, yielding a closed-loop task-aware co-evolution process. Extensive experiments on 9 MAG datasets and 1 non-graph multimodal dataset across 6 graph-centric and modality-centric tasks show that TMTE consistently achieves state-of-the-art performance. Our code is available at https://anonymous.4open.science/r/TMTE-1873.
Problem

Research questions and friction points this paper is trying to address.

multimodal graph learning
topology quality
task-agnostic structure
noisy interactions
missing connections
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal graph learning
task-aware co-evolution
topology optimization
modality representation
metric learning
🔎 Similar Papers
No similar papers found.