🤖 AI Summary
To address the pervasive heterophily in heterogeneous graphs—where modeling similarity between cross-type nodes is inherently challenging—this paper proposes HGMS, a novel pre-training framework. First, it introduces an edge-strength-aware heterogeneous edge dropping augmentation strategy to mitigate structural noise. Second, it incorporates a self-expressive matrix-driven multi-view homophily modeling mechanism: the learned self-expressive matrix serves as an auxiliary enhanced view, enabling accurate identification and correction of false negatives in contrastive learning. Notably, HGMS is the first method to jointly leverage self-expressive learning and edge-level adaptive augmentation for heterogeneous graph pre-training. Extensive experiments demonstrate that HGMS consistently outperforms state-of-the-art methods across multiple downstream tasks, empirically validating that explicit homophily modeling is critical for enhancing heterogeneous graph representation learning.
📝 Abstract
Heterogeneous graph pre-training (HGP) has demonstrated remarkable performance across various domains. However, the issue of heterophily in real-world heterogeneous graphs (HGs) has been largely overlooked. To bridge this research gap, we proposed a novel heterogeneous graph contrastive learning framework, termed HGMS, which leverages connection strength and multi-view self-expression to learn homophilous node representations. Specifically, we design a heterogeneous edge dropping augmentation strategy that enhances the homophily of augmented views. Moreover, we introduce a multi-view self-expressive learning method to infer the homophily between nodes. In practice, we develop two approaches to solve the self-expressive matrix. The solved self-expressive matrix serves as an additional augmented view to provide homophilous information and is used to identify false negatives in contrastive loss. Extensive experimental results demonstrate the superiority of HGMS across different downstream tasks.