Value Alignment Tax: Measuring Value Trade-offs in LLM Alignment

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This study addresses a critical limitation in current value alignment research, which predominantly treats human values as static and overlooks the dynamic impact of alignment interventions on the broader value system. To bridge this gap, the authors propose the “Value Alignment Tax” (VAT) framework, leveraging Schwartz’s theory of basic human values to construct a scenario-action dataset that enables the first quantitative assessment of systematic shifts in non-target values during alignment. Through normative judgment pairing, multidimensional value annotation, comparative analysis of alignment strategies, and modeling of value co-variation, the work reveals that alignment interventions often induce imbalanced yet structurally coherent shifts across interrelated values. This approach introduces a novel dimension for process-level risk assessment and dynamic understanding of value alignment in large language models.

Technology Category

Application Category

📝 Abstract

Existing work on value alignment typically characterizes value relations statically, ignoring how interventions - such as prompting, fine-tuning, or preference optimization - reshape the broader value system. We introduce the Value Alignment Tax (VAT), a framework that measures how alignment-induced changes propagate across interconnected values relative to achieved on-target gain. VAT captures the dynamics of value expression under alignment pressure. Using a controlled scenario-action dataset grounded in Schwartz value theory, we collect paired pre-post normative judgments and analyze alignment effects across models, values, and alignment strategies. Our results show that alignment often produces uneven, structured co-movement among values. These effects are invisible under conventional target-only evaluation, revealing systemic, process-level alignment risks and offering new insights into the dynamics of value alignment in LLMs.

Problem

Research questions and friction points this paper is trying to address.

value alignment

value trade-offs

large language models

alignment interventions

value dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Value Alignment Tax

value trade-offs

LLM alignment