Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

📅 2025-02-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of jointly aligning large language models (LLMs) with the tripartite objectives of helpfulness, honesty, and harmlessness (3H), this work introduces the first benchmark dedicated to 3H-aware model merging, uncovering latent cooperation and conflict mechanisms across these dimensions. We propose R-TSVM—a training-free model merging framework integrating outlier-aware parameter weighting and sparse adaptive rank selection—to mitigate interference from redundant parameters and outliers. Unlike conventional data-mixing paradigms, R-TSVM leverages singular vector decomposition, parameter-level conflict resolution, and heavy-tailed distribution modeling to achieve superior multi-objective trade-offs. Extensive experiments demonstrate that R-TSVM consistently outperforms 12 model merging and 3 data mixing baselines across 10 datasets and 5 annotation dimensions, significantly improving holistic 3H alignment. All models are publicly released on Hugging Face.

Technology Category

Application Category

📝 Abstract
Achieving balanced alignment of large language models (LLMs) in terms of Helpfulness, Honesty, and Harmlessness (3H optimization) constitutes a cornerstone of responsible AI, with existing methods like data mixture strategies facing limitations including reliance on expert knowledge and conflicting optimization signals. While model merging offers a promising alternative by integrating specialized models, its potential for 3H optimization remains underexplored. This paper establishes the first comprehensive benchmark for model merging in 3H-aligned LLMs, systematically evaluating 15 methods (12 training-free merging and 3 data mixture techniques) across 10 datasets associated with 5 annotation dimensions, 2 LLM families, and 2 training paradigms. Our analysis reveals three pivotal insights: (i) previously overlooked collaborative/conflicting relationships among 3H dimensions, (ii) the consistent superiority of model merging over data mixture approaches in balancing alignment trade-offs, and (iii) the critical role of parameter-level conflict resolution through redundant component pruning and outlier mitigation. Building on these findings, we propose R-TSVM, a Reweighting-enhanced Task Singular Vector Merging method that incorporates outlier-aware parameter weighting and sparsity-adaptive rank selection strategies adapted to the heavy-tailed parameter distribution and sparsity for LLMs, further improving LLM alignment across multiple evaluations. Our models will be available at https://huggingface.co/Jinluan.
Problem

Research questions and friction points this paper is trying to address.

Balancing Helpfulness, Honesty, Harmlessness in LLMs
Exploring model merging for 3H optimization
Evaluating merging methods vs. data mixture techniques
Innovation

Methods, ideas, or system contributions that make the work stand out.

Model merging optimizes LLM balance.
R-TSVM enhances parameter conflict resolution.
Pruning improves alignment trade-offs effectively.
🔎 Similar Papers
No similar papers found.
J
Jinluan Yang
Ant Group Research Intern Program, Zhejiang University
D
Dingnan Jin
Ant Group
Anke Tang
Anke Tang
Ph.D Student, Wuhan University
Machine Learning
L
Li Shen
Shenzhen Campus of Sun Yat-sen University
Didi Zhu
Didi Zhu
Imperial College London
Multi-Modal LLMsOut of Distribution Generalization
Z
Zhengyu Chen
Zhejiang University
Daixin Wang
Daixin Wang
Tsinghua University
Q
Qing Cui
Ant Group
Z
Zhiqiang Zhang
Ant Group
J
Jun Zhou
Ant Group
F
Fei Wu
Zhejiang University
Kun Kuang
Kun Kuang
Zhejiang University
Causal InferenceData MiningMachine Learning