π€ AI Summary
Graph neural networks (GNNs) suffer from inflexible message-passing architectures, limiting adaptability to diverse graph structures and downstream tasks. Existing graph mixture-of-experts (MoE) approaches rely heavily on supervised signals and exhibit unstable training due to expert heterogeneity. To address these limitations, we propose ADaMoREβa novel unsupervised graph MoE framework. ADaMoRE introduces a backbone-residual expert architecture coupled with a structure-aware gating mechanism and an information-entropy-based diversity regularization, enabling functional specialization of experts and enhanced training stability. The method jointly optimizes a self-supervised graph reconstruction objective and information-theoretic constraints in an end-to-end manner. Evaluated across 16 benchmarks, ADaMoRE achieves state-of-the-art performance in unsupervised node classification and few-shot learning, while demonstrating superior generalization, high training efficiency, and rapid convergence.
π Abstract
Graph Neural Networks (GNNs) face a fundamental adaptability challenge: their fixed message-passing architectures struggle with the immense diversity of real-world graphs, where optimal computational strategies vary by local structure and task. While Mixture-of-Experts (MoE) offers a promising pathway to adaptability, existing graph MoE methods remain constrained by their reliance on supervised signals and instability when training heterogeneous experts. We introduce ADaMoRE (Adaptive Mixture of Residual Experts), a principled framework that enables robust, fully unsupervised training of heterogeneous MoE on graphs. ADaMoRE employs a backbone-residual expert architecture where foundational encoders provide stability while specialized residual experts capture diverse computational patterns. A structurally-aware gating network performs fine-grained node routing. The entire architecture is trained end-to-end using a unified unsupervised objective, which integrates a primary reconstruction task with an information-theoretic diversity regularizer to explicitly enforce functional specialization among the experts. Theoretical analysis confirms our design improves data efficiency and training stability. Extensive evaluation across 16 benchmarks validates ADaMoRE's state-of-the-art performance in unsupervised node classification and few-shot learning, alongside superior generalization, training efficiency, and faster convergence on diverse graphs and tasks.