XPERT: Expert Knowledge Transfer for Effective Training of Language Models

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This work addresses the challenge of efficiently reusing cross-domain generalist knowledge from pretrained Mixture-of-Experts (MoE) large language models to enhance training efficiency and performance across models of varying scales. It proposes a novel framework that treats MoE expert modules as structured, traceable knowledge sources. By analyzing expert activation patterns to identify generalist experts, the method refines their representations through tensor decomposition and constructs a training-free knowledge transfer and adaptation mechanism. Experimental results demonstrate that this approach significantly outperforms strong baselines on both language understanding and dialogue generation tasks, accelerating model convergence while improving final performance.
📝 Abstract
Mixture-of-Experts (MoE) language models organize knowledge into explicitly routed expert modules, making expert-level representations traceable and analyzable. By analyzing expert activation patterns in MoE large language models (LLMs), we find that a subset of experts is consistently activated across diverse knowledge domains. These common experts encode cross-domain, generalizable knowledge that is closely related to model generalization, naturally raising the question of how such identifiable expert knowledge can be practically reused. Motivated by this observation, we propose XPERT, a framework that extracts, consolidates, and reuses expert knowledge from pre-trained MoE LLMs to support more effective training of language models across different model scales. XPERT identifies cross-domain experts via inference-only analysis, refines their representations through tensor decomposition, and adapts the extracted knowledge to reuse in downstream models. Experiments on language understanding and dialogue generation benchmarks show that models benefiting from reused expert knowledge achieve consistently stronger performance and faster convergence compared to strong baselines. These results highlight MoE LLMs as structured and reusable knowledge sources, and demonstrate the value of expert-level knowledge reuse for improving model training.
Problem

Research questions and friction points this paper is trying to address.

Mixture-of-Experts
expert knowledge reuse
language model training
cross-domain knowledge
knowledge transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mixture-of-Experts
expert knowledge transfer
tensor decomposition
knowledge reuse
language model training
🔎 Similar Papers
C
Chang Liu
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
B
Boyu Shi
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
X
Xu Yang
School of Computer Science and Engineering, Southeast University, Nanjing, China; Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China
Xin Geng
Xin Geng
School of Computer Science and Engineering, Southeast University
Artificial IntelligencePattern RecognitionMachine Learning