VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models

πŸ“… 2026-03-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

192K/year
πŸ€– AI Summary
This work addresses the challenge of aligning large language models with multiple, potentially conflicting human valuesβ€”a process often hindered by high computational costs and performance degradation. The authors propose VC-Soup, a novel framework that introduces a cosine-similarity-based value consistency metric to guide alignment. By integrating value-consistent learning with reward gap vector analysis for data filtering, and combining supervised fine-tuning with linear fusion of policy models, VC-Soup achieves efficient multi-value alignment. The method preserves linear mode connectivity while leveraging Pareto front optimization to mitigate value conflicts. Experimental results demonstrate that VC-Soup significantly outperforms existing approaches across multiple metrics, effectively enhancing both the balance and overall performance of value alignment.

Technology Category

Application Category

πŸ“ Abstract
As large language models (LLMs) increasingly shape content generation, interaction, and decision-making across the Web, aligning them with human values has become a central objective in trustworthy AI. This challenge becomes even more pronounced when aligning multiple, potentially conflicting human values. Although recent approaches, such as reward reweighting, prompt-based supervised fine-tuning, and model merging, attempt to tackle multi-value alignment, they still face two major limitations: (1) training separate models for each value combination is prohibitively expensive; (2) value conflicts substantially degrade alignment performance. These limitations make it difficult to achieve favorable trade-offs across diverse human values. To address these challenges, we revisit multi-value alignment from the perspective of value consistency in data and propose VC-soup, a data filtering and parameter merging framework grounded in value-consistent learning. We first design a value consistency metric based on the cosine similarity between the reward-gap vector of each preference pair and an all-ones vector, which quantifies its cross-value coherence. We then filter out low-consistency preference pairs in each value dataset and train on the remaining data to obtain smooth, value-consistent policy models that better preserve linear mode connectivity. Finally, we linearly combine these policies and apply Pareto filtering across values to obtain solutions with balanced multi-value performance. Extensive experiments and theoretical analysis demonstrate that VC-soup effectively mitigates conflicts and consistently outperforms existing multi-value alignment methods.
Problem

Research questions and friction points this paper is trying to address.

multi-value alignment
value conflict
large language models
trustworthy AI
human values
Innovation

Methods, ideas, or system contributions that make the work stand out.

value consistency
multi-value alignment
parameter merging
Pareto filtering
linear mode connectivity