CoMoE: Contrastive Representation for Mixture-of-Experts in Parameter-Efficient Fine-tuning

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In parameter-efficient fine-tuning of Mixture-of-Experts (MoE) models, expert functional overlap and suboptimal capacity utilization persist under heterogeneous data distributions. Method: This paper proposes the first contrastive learning framework for sparse-gated MoE fine-tuning, built upon top-k routing. It constructs contrastive objectives between activated and unactivated experts per input, explicitly modeling mutual information differences between inputs and experts to enhance expert modularity and task specificity. Mutual information is approximated and optimized to sharpen expert specialization and improve capacity utilization. Contribution/Results: Experiments demonstrate consistent performance gains across multi-task and standard benchmarks, with measurable improvements in expert specialization—while maintaining identical computational overhead.

Technology Category

Application Category

📝 Abstract
In parameter-efficient fine-tuning, mixture-of-experts (MoE), which involves specializing functionalities into different experts and sparsely activating them appropriately, has been widely adopted as a promising approach to trade-off between model capacity and computation overhead. However, current MoE variants fall short on heterogeneous datasets, ignoring the fact that experts may learn similar knowledge, resulting in the underutilization of MoE's capacity. In this paper, we propose Contrastive Representation for MoE (CoMoE), a novel method to promote modularization and specialization in MoE, where the experts are trained along with a contrastive objective by sampling from activated and inactivated experts in top-k routing. We demonstrate that such a contrastive objective recovers the mutual-information gap between inputs and the two types of experts. Experiments on several benchmarks and in multi-task settings demonstrate that CoMoE can consistently enhance MoE's capacity and promote modularization among the experts.
Problem

Research questions and friction points this paper is trying to address.

Improving expert specialization in MoE for parameter-efficient fine-tuning
Addressing knowledge overlap among experts in heterogeneous datasets
Enhancing MoE capacity and modularization via contrastive representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Contrastive objective enhances MoE specialization
Sampling from activated and inactivated experts
Improves modularization and mutual-information recovery
🔎 Similar Papers
No similar papers found.
J
Jinyuan Feng
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
C
Chaopeng Wei
University of Science and Technology Beijing
Tenghai Qiu
Tenghai Qiu
Institute of Automation,Chinese Academy of Sciences
intelligent decisiondeep reinforcement learningmulti-agent
Tianyi Hu
Tianyi Hu
Purdue University
Multi-phase flow
Z
Zhiqiang Pu
Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences