Learning How Much to Think: Difficulty-Aware Dynamic MoEs for Graph Node Classification

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF

career value

270K/year
🤖 AI Summary
This work addresses the limitations of static routing in conventional Mixture-of-Experts (MoE) architectures for graph neural networks, which fail to dynamically allocate computational resources according to node-specific discriminative difficulty, often resulting in underfitting on hard samples and redundant computation on easy ones. To overcome this, the authors propose D2MoE, a novel framework that integrates node difficulty into the MoE routing mechanism for the first time. It leverages real-time prediction entropy to assess node difficulty and introduces a difficulty-aware top-p sparse routing strategy, enabling fine-grained and continuous allocation of expert resources on demand. Evaluated across 13 benchmark datasets, D2MoE achieves state-of-the-art performance, with accuracy improvements up to 7.92% on heterophilic graphs, while reducing memory consumption by 73.07% and training time by 46.53% on large-scale graphs.

Technology Category

Application Category

📝 Abstract
Mixture-of-Experts (MoE) architectures offer a scalable path for Graph Neural Networks (GNNs) in node classification tasks but typically rely on static and rigid routing strategies that enforce a uniform expert budget or coarse-grained expert toggles on all nodes. This limitation overlooks the varying discriminative difficulty of nodes and leads to under-fitting for hard nodes and redundant computation for easy ones. To resolve this issue, we propose D2MoE, a novel framework that shifts the focus from static expert selection to node-wise expert resource allocation. By using predictive entropy as a real-time proxy for difficulty, D2MoE employs a difficulty-driven top-p routing mechanism to adaptively concentrate expert resources on hard nodes while reducing overhead for easy ones, achieving continuous and fine-grained expert budget scaling for node classification. Experiments on 13 benchmarks demonstrate that D2MoE achieves consistent state-of-the-art performance, surpassing leading baselines by up to 7.92% in accuracy on heterophilous graphs. Notably, on large-scale graphs, it reduces memory consumption by up to 73.07% and training time by 46.53% compared to the best-performing Graph MoE, thereby validating its superior efficiency.
Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts
Graph Neural Networks
Node Classification
Difficulty-Aware Routing
Expert Budget Allocation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Difficulty-Aware Routing
Dynamic Mixture-of-Experts
Graph Neural Networks
Node-wise Resource Allocation
Predictive Entropy
Jiajun Zhou
Jiajun Zhou
Zhejiang University of Technology
Graph Data MiningGraph Data AugmentationBlockchain Data AnalysisGraph for Cybersecurity
Y
Yadong Li
Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou 310023, China; Binjiang Cyberspace Security Institute of ZJUT, Hangzhou, 310056, China; Soovar Technologies Co., Ltd., Hangzhou 310056, China
Xuanze Chen
Xuanze Chen
Ph.D, HongShan Capital
Biophotonicssuper-resolution microscopy
Chen Ma
Chen Ma
Gaoling School of Artificial Intelligence, Renmin University of China
LLM-based AgentRecommender System
Chuang Zhao
Chuang Zhao
PhD Candidate, The Hong Kong University of Science and Technology
AI for HealthcareRecommendation SystemTransfer Learning
S
Shanqing Yu
Institute of Cyberspace Security, Zhejiang University of Technology, Hangzhou 310023, China; Binjiang Cyberspace Security Institute of ZJUT, Hangzhou, 310056, China; Soovar Technologies Co., Ltd., Hangzhou 310056, China
Qi Xuan
Qi Xuan
Professor, Zhejiang University of Technology
AI SecuritySocial NetworkDeep LearningData Mining