Mix Data or Merge Models? Balancing the Helpfulness, Honesty, and Harmlessness of Large Language Model via Model Merging

📅 2025-02-08

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

Addressing the challenge of jointly aligning large language models (LLMs) with the tripartite objectives of helpfulness, honesty, and harmlessness (3H), this work introduces the first benchmark dedicated to 3H-aware model merging, uncovering latent cooperation and conflict mechanisms across these dimensions. We propose R-TSVM—a training-free model merging framework integrating outlier-aware parameter weighting and sparse adaptive rank selection—to mitigate interference from redundant parameters and outliers. Unlike conventional data-mixing paradigms, R-TSVM leverages singular vector decomposition, parameter-level conflict resolution, and heavy-tailed distribution modeling to achieve superior multi-objective trade-offs. Extensive experiments demonstrate that R-TSVM consistently outperforms 12 model merging and 3 data mixing baselines across 10 datasets and 5 annotation dimensions, significantly improving holistic 3H alignment. All models are publicly released on Hugging Face.

Technology Category

Application Category

📝 Abstract

Achieving balanced alignment of large language models (LLMs) in terms of Helpfulness, Honesty, and Harmlessness (3H optimization) constitutes a cornerstone of responsible AI, with existing methods like data mixture strategies facing limitations including reliance on expert knowledge and conflicting optimization signals. While model merging offers a promising alternative by integrating specialized models, its potential for 3H optimization remains underexplored. This paper establishes the first comprehensive benchmark for model merging in 3H-aligned LLMs, systematically evaluating 15 methods (12 training-free merging and 3 data mixture techniques) across 10 datasets associated with 5 annotation dimensions, 2 LLM families, and 2 training paradigms. Our analysis reveals three pivotal insights: (i) previously overlooked collaborative/conflicting relationships among 3H dimensions, (ii) the consistent superiority of model merging over data mixture approaches in balancing alignment trade-offs, and (iii) the critical role of parameter-level conflict resolution through redundant component pruning and outlier mitigation. Building on these findings, we propose R-TSVM, a Reweighting-enhanced Task Singular Vector Merging method that incorporates outlier-aware parameter weighting and sparsity-adaptive rank selection strategies adapted to the heavy-tailed parameter distribution and sparsity for LLMs, further improving LLM alignment across multiple evaluations. Our models will be available at https://huggingface.co/Jinluan.

Problem

Research questions and friction points this paper is trying to address.

Balancing Helpfulness, Honesty, Harmlessness in LLMs

Exploring model merging for 3H optimization

Evaluating merging methods vs. data mixture techniques

Innovation

Methods, ideas, or system contributions that make the work stand out.

Model merging optimizes LLM balance.

R-TSVM enhances parameter conflict resolution.

Pruning improves alignment trade-offs effectively.

🔎 Similar Papers

No similar papers found.