VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models

πŸ“… 2026-03-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the challenge of aligning large language models with multiple, potentially conflicting human valuesβ€”a process often hindered by high computational costs and performance degradation. The authors propose VC-Soup, a novel framework that introduces a cosine-similarity-based value consistency metric to guide alignment. By integrating value-consistent learning with reward gap vector analysis for data filtering, and combining supervised fine-tuning with linear fusion of policy models, VC-Soup achieves efficient multi-value alignment. The method preserves linear mode connectivity while leveraging Pareto front optimization to mitigate value conflicts. Experimental results demonstrate that VC-Soup significantly outperforms existing approaches across multiple metrics, effectively enhancing both the balance and overall performance of value alignment.

Technology Category

Application Category

πŸ“ Abstract
As large language models (LLMs) increasingly shape content generation, interaction, and decision-making across the Web, aligning them with human values has become a central objective in trustworthy AI. This challenge becomes even more pronounced when aligning multiple, potentially conflicting human values. Although recent approaches, such as reward reweighting, prompt-based supervised fine-tuning, and model merging, attempt to tackle multi-value alignment, they still face two major limitations: (1) training separate models for each value combination is prohibitively expensive; (2) value conflicts substantially degrade alignment performance. These limitations make it difficult to achieve favorable trade-offs across diverse human values. To address these challenges, we revisit multi-value alignment from the perspective of value consistency in data and propose VC-soup, a data filtering and parameter merging framework grounded in value-consistent learning. We first design a value consistency metric based on the cosine similarity between the reward-gap vector of each preference pair and an all-ones vector, which quantifies its cross-value coherence. We then filter out low-consistency preference pairs in each value dataset and train on the remaining data to obtain smooth, value-consistent policy models that better preserve linear mode connectivity. Finally, we linearly combine these policies and apply Pareto filtering across values to obtain solutions with balanced multi-value performance. Extensive experiments and theoretical analysis demonstrate that VC-soup effectively mitigates conflicts and consistently outperforms existing multi-value alignment methods.
Problem

Research questions and friction points this paper is trying to address.

multi-value alignment
value conflict
large language models
trustworthy AI
human values
Innovation

Methods, ideas, or system contributions that make the work stand out.

value consistency
multi-value alignment
parameter merging
Pareto filtering
linear mode connectivity
πŸ”Ž Similar Papers
No similar papers found.
H
Hefei Xu
Key Laboratory of Knowledge Engineering with Big Data, Hefei University of Technology, Hefei, Anhui, China
Le Wu
Le Wu
Hefei University of Technology
recommender systemsuser modelingexplainabilty and fairness in recommendation
Y
Yu Wang
Key Laboratory of Knowledge Engineering with Big Data, Hefei University of Technology, Hefei, Anhui, China
Min Hou
Min Hou
Hefei University of Technology
H
Han Wu
Key Laboratory of Knowledge Engineering with Big Data, Hefei University of Technology, Hefei, Anhui, China
Zhen Zhang
Zhen Zhang
Associate Professor of Electrical & Computer Engineering, Utah State University
formal methodsprobabilistic model checkingdeep neural networkssynthetic biologyNetwork-on
M
Meng Wang
Key Laboratory of Knowledge Engineering with Big Data, Hefei University of Technology, Hefei, Anhui, China