Learning How Much to Think: Difficulty-Aware Dynamic MoEs for Graph Node Classification

📅 2026-04-13

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the limitations of static routing in conventional Mixture-of-Experts (MoE) architectures for graph neural networks, which fail to dynamically allocate computational resources according to node-specific discriminative difficulty, often resulting in underfitting on hard samples and redundant computation on easy ones. To overcome this, the authors propose D2MoE, a novel framework that integrates node difficulty into the MoE routing mechanism for the first time. It leverages real-time prediction entropy to assess node difficulty and introduces a difficulty-aware top-p sparse routing strategy, enabling fine-grained and continuous allocation of expert resources on demand. Evaluated across 13 benchmark datasets, D2MoE achieves state-of-the-art performance, with accuracy improvements up to 7.92% on heterophilic graphs, while reducing memory consumption by 73.07% and training time by 46.53% on large-scale graphs.

Technology Category

Application Category

📝 Abstract

Mixture-of-Experts (MoE) architectures offer a scalable path for Graph Neural Networks (GNNs) in node classification tasks but typically rely on static and rigid routing strategies that enforce a uniform expert budget or coarse-grained expert toggles on all nodes. This limitation overlooks the varying discriminative difficulty of nodes and leads to under-fitting for hard nodes and redundant computation for easy ones. To resolve this issue, we propose D2MoE, a novel framework that shifts the focus from static expert selection to node-wise expert resource allocation. By using predictive entropy as a real-time proxy for difficulty, D2MoE employs a difficulty-driven top-p routing mechanism to adaptively concentrate expert resources on hard nodes while reducing overhead for easy ones, achieving continuous and fine-grained expert budget scaling for node classification. Experiments on 13 benchmarks demonstrate that D2MoE achieves consistent state-of-the-art performance, surpassing leading baselines by up to 7.92% in accuracy on heterophilous graphs. Notably, on large-scale graphs, it reduces memory consumption by up to 73.07% and training time by 46.53% compared to the best-performing Graph MoE, thereby validating its superior efficiency.

Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts

Graph Neural Networks

Node Classification

Difficulty-Aware Routing

Expert Budget Allocation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Difficulty-Aware Routing

Dynamic Mixture-of-Experts

Graph Neural Networks