Mixture of insighTful Experts (MoTE): The Synergy of Thought Chains and Expert Mixtures in Self-Alignment

📅 2024-05-01
🏛️ arXiv.org
📈 Citations: 9
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) face challenges in value alignment, while smaller models (e.g., 7B-parameter variants) exhibit weak safety robustness. Method: This paper proposes a self-alignment framework integrating structured reasoning chains with a Mixture-of-Experts (MoE) architecture. It introduces a four-stage safety reasoning chain—problem analysis, response guidance, safe generation, and safety verification—and pioneers a step-level routing multi-LoRA architecture enabling dynamic-length reasoning and lossless expert switching. Contribution/Results: Through self-alignment training and multi-LoRA fine-tuning, the framework significantly improves safety compliance, jailbreak resistance, and over-rejection mitigation. On a 7B model, it achieves alignment performance comparable to OpenAI’s o1, establishing a novel paradigm for efficient, lightweight model alignment. The approach bridges the gap between computational efficiency and rigorous safety assurance without sacrificing inference fidelity or expert specialization.

Technology Category

Application Category

📝 Abstract
As the capabilities of large language models (LLMs) continue to expand, aligning these models with human values remains a significant challenge. Recent studies show that reasoning abilities contribute significantly to model safety, while integrating Mixture-of-Experts (MoE) architectures can further enhance alignment. In this work, we propose Mixture of insighTful Experts (MoTE), a novel framework that synergistically combines reasoning chains and expert mixtures to improve self-alignments. From a data perspective, MoTE employs a structured reasoning chain comprising four key stages: Question Analysis, Answer Guidance, Safe Answer, and Safety Checking. This approach enhances safety through multi-step reasoning and proves effective even for smaller and less powerful LLMs (e.g., 7B models). From an architectural perspective, MoTE adopts a multi-LoRA framework with step-level routing, where each expert is dedicated to a specific reasoning step. This design eliminates the need for balance losses, ensures stable training, and supports adaptive inference lengths. Experimental results demonstrate that MoTE significantly improves model safety, jailbreak resistance, and over-refusal capabilities, achieving performance comparable to OpenAI's state-of-the-art o1 model.
Problem

Research questions and friction points this paper is trying to address.

Aligning large language models with human values
Enhancing model safety through reasoning chains
Improving self-alignment using Mixture of Experts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines reasoning chains and expert mixtures
Employs structured four-stage reasoning chain
Adopts multi-LoRA framework with step-level routing
🔎 Similar Papers
No similar papers found.
Zhili Liu
Zhili Liu
Beike
SLAMDLHPCComputer Graphics
Y
Yunhao Gou
The Hong Kong University of Science and Technology, Southern University of Science and Technology
K
Kai Chen
The Hong Kong University of Science and Technology
L
Lanqing Hong
Huawei Noah’s Ark Lab
Jiahui Gao
Jiahui Gao
The University of Hong Kong
Synthetic Data GenerationMultimodal ModelNLP
Fei Mi
Fei Mi
Huawei Noah's Ark Lab
LLM Post Training
Y
Yu Zhang
Southern University of Science and Technology
Zhenguo Li
Zhenguo Li
Huawei Noah's Ark Lab, Columbia, CUHK, PKU
machine learninggenerative AIAI for mathematics
X
Xin Jiang
Huawei Noah’s Ark Lab
Q
Qun Liu
Huawei Noah’s Ark Lab
James T. Kwok
James T. Kwok
Professor of Computer Science and Engineering, Hong Kong University of Science and Technology
Machine learning