MoE Pathfinder: Trajectory-driven Expert Pruning

📅 2025-12-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deploying Mixture-of-Experts (MoE) large language models faces challenges of high computational complexity and low activation efficiency. Method: This paper proposes a global path-planning pruning framework based on cross-layer expert activation trajectories. Unlike conventional approaches relying on local metrics and uniform per-layer pruning, our method formulates expert selection as an optimal path search over a weighted computation graph, and introduces a novel trajectory-level importance scoring mechanism that jointly incorporates reconstruction error, routing probability, and activation magnitude—enabling inter-layer non-uniform expert retention. Contribution/Results: Experiments demonstrate that our method significantly outperforms existing pruning techniques across multiple benchmark tasks. It maintains high model accuracy while substantially reducing computational overhead, thereby markedly improving the deployment feasibility of MoE models.

Technology Category

Application Category

📝 Abstract
Mixture-of-experts (MoE) architectures used in large language models (LLMs) achieve state-of-the-art performance across diverse tasks yet face practical challenges such as deployment complexity and low activation efficiency. Expert pruning has thus emerged as a promising solution to reduce computational overhead and simplify the deployment of MoE models. However, existing expert pruning approaches conventionally rely on local importance metrics and often apply uniform layer-wise pruning, leveraging only partial evaluation signals and overlooking the heterogeneous contributions of experts across layers. To address these limitations, we propose an expert pruning approach based on the trajectory of activated experts across layers, which treats MoE as a weighted computation graph and casts expert selection as a global optimal path planning problem. Within this framework, we integrate complementary importance signals from reconstruction error, routing probabilities, and activation strength at the trajectory level, which naturally yields non-uniform expert retention across layers. Experiments show that our approach achieves superior pruning performance on nearly all tasks compared with most existing approaches.
Problem

Research questions and friction points this paper is trying to address.

Optimizes expert pruning in MoE models for efficiency
Addresses deployment complexity and activation inefficiency in LLMs
Selects experts globally using trajectory-based importance signals
Innovation

Methods, ideas, or system contributions that make the work stand out.

Trajectory-based expert pruning for MoE models
Global optimal path planning for expert selection
Integrates multiple importance signals across layers
🔎 Similar Papers
No similar papers found.
X
Xican Yang
University of Science and Technology of China
Yuanhe Tian
Yuanhe Tian
University of Washington
Computational LinguisticsNatural Language Processing
Y
Yan Song
University of Science and Technology of China