Disentangling Homophily and Heterophily in Multimodal Graph Clustering

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the degradation of clustering performance caused by the coexistence of homophilous and heterophilous neighborhoods in multimodal graphs, this paper proposes a decoupled multimodal graph clustering framework, DMGC. DMGC is the first method to explicitly disentangle homophilous relationships—enhancing intra-class semantic consistency—from heterophilous relationships—capturing inter-class associations—thereby constructing dual-view graph representations. It introduces a multimodal dual-frequency fusion mechanism, graph structure decomposition, and cross-modal consistency modeling, coupled with a self-supervised alignment objective to mitigate class confusion. Extensive experiments on multiple multimodal and multi-relational graph benchmarks demonstrate that DMGC consistently outperforms state-of-the-art methods, achieving new SOTA clustering performance. These results validate both its effectiveness in handling mixed neighborhood structures and its strong generalization capability across diverse multimodal graph scenarios.

Technology Category

Application Category

📝 Abstract
Multimodal graphs, which integrate unstructured heterogeneous data with structured interconnections, offer substantial real-world utility but remain insufficiently explored in unsupervised learning. In this work, we initiate the study of multimodal graph clustering, aiming to bridge this critical gap. Through empirical analysis, we observe that real-world multimodal graphs often exhibit hybrid neighborhood patterns, combining both homophilic and heterophilic relationships. To address this challenge, we propose a novel framework -- extsc{Disentangled Multimodal Graph Clustering (DMGC)} -- which decomposes the original hybrid graph into two complementary views: (1) a homophily-enhanced graph that captures cross-modal class consistency, and (2) heterophily-aware graphs that preserve modality-specific inter-class distinctions. We introduce a emph{Multimodal Dual-frequency Fusion} mechanism that jointly filters these disentangled graphs through a dual-pass strategy, enabling effective multimodal integration while mitigating category confusion. Our self-supervised alignment objectives further guide the learning process without requiring labels. Extensive experiments on both multimodal and multi-relational graph datasets demonstrate that DMGC achieves state-of-the-art performance, highlighting its effectiveness and generalizability across diverse settings. Our code is available at https://github.com/Uncnbb/DMGC.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised clustering of multimodal graphs with hybrid patterns
Disentangling homophilic and heterophilic relationships in multimodal data
Integrating cross-modal consistency and modality-specific distinctions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes hybrid graph into homophily and heterophily views
Uses Multimodal Dual-frequency Fusion for effective integration
Self-supervised alignment objectives guide learning without labels
🔎 Similar Papers
No similar papers found.