VC-Soup: Value-Consistency Guided Multi-Value Alignment for Large Language Models

📅 2026-03-18

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This work addresses the challenge of aligning large language models with multiple, potentially conflicting human values—a process often hindered by high computational costs and performance degradation. The authors propose VC-Soup, a novel framework that introduces a cosine-similarity-based value consistency metric to guide alignment. By integrating value-consistent learning with reward gap vector analysis for data filtering, and combining supervised fine-tuning with linear fusion of policy models, VC-Soup achieves efficient multi-value alignment. The method preserves linear mode connectivity while leveraging Pareto front optimization to mitigate value conflicts. Experimental results demonstrate that VC-Soup significantly outperforms existing approaches across multiple metrics, effectively enhancing both the balance and overall performance of value alignment.

Technology Category

Application Category

📝 Abstract

As large language models (LLMs) increasingly shape content generation, interaction, and decision-making across the Web, aligning them with human values has become a central objective in trustworthy AI. This challenge becomes even more pronounced when aligning multiple, potentially conflicting human values. Although recent approaches, such as reward reweighting, prompt-based supervised fine-tuning, and model merging, attempt to tackle multi-value alignment, they still face two major limitations: (1) training separate models for each value combination is prohibitively expensive; (2) value conflicts substantially degrade alignment performance. These limitations make it difficult to achieve favorable trade-offs across diverse human values. To address these challenges, we revisit multi-value alignment from the perspective of value consistency in data and propose VC-soup, a data filtering and parameter merging framework grounded in value-consistent learning. We first design a value consistency metric based on the cosine similarity between the reward-gap vector of each preference pair and an all-ones vector, which quantifies its cross-value coherence. We then filter out low-consistency preference pairs in each value dataset and train on the remaining data to obtain smooth, value-consistent policy models that better preserve linear mode connectivity. Finally, we linearly combine these policies and apply Pareto filtering across values to obtain solutions with balanced multi-value performance. Extensive experiments and theoretical analysis demonstrate that VC-soup effectively mitigates conflicts and consistently outperforms existing multi-value alignment methods.

Problem

Research questions and friction points this paper is trying to address.

multi-value alignment

value conflict

large language models

trustworthy AI

human values

Innovation

Methods, ideas, or system contributions that make the work stand out.

value consistency

multi-value alignment

parameter merging