🤖 AI Summary
Graph Neural Networks (GNNs) struggle to model heterogeneous graph structures and capture long-range dependencies, while Graph Transformers alleviate these issues but suffer from poor scalability and limited robustness to noise. To address these limitations, we propose GNNMoE—a novel paradigm for general node classification on heterogeneous graphs—uniquely integrating the Mixture-of-Experts (MoE) mechanism with fine-grained decoupled message passing. We design a soft-hard dual-mode gating mechanism for node-level dynamic expert selection, and introduce adaptive residual connections and an enhanced feed-forward network to improve representation robustness and generalization. Extensive experiments on multiple heterogeneous graph benchmarks demonstrate that GNNMoE significantly outperforms state-of-the-art methods, effectively mitigating over-smoothing and global noise. Moreover, it exhibits strong cross-graph-type adaptability, superior robustness to structural and feature perturbations, and high computational efficiency on large-scale graphs.
📝 Abstract
Graph neural networks excel at graph representation learning but struggle with heterophilous data and long-range dependencies. And graph transformers address these issues through self-attention, yet face scalability and noise challenges on large-scale graphs. To overcome these limitations, we propose GNNMoE, a universal model architecture for node classification. This architecture flexibly combines fine-grained message-passing operations with a mixture-of-experts mechanism to build feature encoding blocks. Furthermore, by incorporating soft and hard gating layers to assign the most suitable expert networks to each node, we enhance the model's expressive power and adaptability to different graph types. In addition, we introduce adaptive residual connections and an enhanced FFN module into GNNMoE, further improving the expressiveness of node representation. Extensive experimental results demonstrate that GNNMoE performs exceptionally well across various types of graph data, effectively alleviating the over-smoothing issue and global noise, enhancing model robustness and adaptability, while also ensuring computational efficiency on large-scale graphs.