SD-MoE: Spectral Decomposition for Effective Expert Specialization

📅 2026-02-13
📈 Citations: 0
Influential: 0
📄 PDF

Technology Category

Application Category

📝 Abstract
Mixture-of-Experts (MoE) architectures scale Large Language Models via expert specialization induced by conditional computation. In practice, however, expert specialization often fails: some experts become functionally similar, while others functioning as de facto shared experts, limiting the effective capacity and model performance. In this work, we analysis from a spectral perspective on parameter and gradient spaces, uncover that (1) experts share highly overlapping dominant spectral components in their parameters, (2) dominant gradient subspaces are strongly aligned across experts, driven by ubiquitous low-rank structure in human corpus, and (3) gating mechanisms preferentially route inputs along these dominant directions, further limiting specialization. To address this, we propose Spectral-Decoupled MoE (SD-MoE), which decomposes both parameter and gradient in the spectral space. SD-MoE improves performance across downstream tasks, enables effective expert specialization, incurring minimal additional computation, and can be seamlessly integrated into a wide range of existing MoE architectures, including Qwen and DeepSeek.
Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts
expert specialization
spectral decomposition
parameter overlap
gradient alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Spectral Decomposition
Mixture-of-Experts
Expert Specialization
Gradient Subspace
Conditional Computation
🔎 Similar Papers
No similar papers found.
R
Ruijun Huang
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China; Greater Bay Area National Center of Technology Innovation, Research Institute of Tsinghua University in Shenzhen, Shenzhen, China
Fang Dong
Fang Dong
Southeast University
Edge CompuingCloudAIOT
Xin Zhang
Xin Zhang
Fudan University
SpeechMultimodal LLMLLMNLP
H
Hengjie Cao
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Z
Zhendong Huang
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
A
Anrui Chen
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
J
Jixian Zhou
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
M
Mengyi Chen
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Yifeng Yang
Yifeng Yang
Department of Computer Science, Shanghai Jiaotong University
Machine Learning
Mingzhi Dong
Mingzhi Dong
University of Bath
Yujiang Wang
Yujiang Wang
University of Oxford
AI in HealthcareAI4Science
Jinlong Hou
Jinlong Hou
Shanghai Innovation Institute (SII)
machine learningdeep learninghigh performance computingdrug discoverymedical
Qin Lv
Qin Lv
University of Colorado Boulder
data analytics for ubiquitous computing and scientific discovery (systemsalgorithmsapplications)
R
Robert P. Dick
Department of Electrical Engineering and Computer Science, University of Michigan
Y
Yuan Cheng
Shanghai Innovation Institute, Shanghai, China
Fan Yang
Fan Yang
School of Finance and Business, Shanghai Normal University
Operations researchSchedulingPerformance analysis
T
Tun Lu
College of Computer Science and Artificial Intelligence, Fudan University, Shanghai, China
Chun Zhang
Chun Zhang
Tsinghua University, Beijing Visual Science and Translational Eye Research Institute (BERI)
glaucomastem cellganglion cellophthalmologydevice
Li Shang
Li Shang
Fudan University, Univ. Colorado Boulder
Human-centered Computingmachine learningcomputer systemsVLSI&EDA