Generative Value Conflicts Reveal LLM Priorities

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing alignment datasets lack scenarios involving value conflicts, hindering evaluation of LLMs’ priority decisions under pluralistic value tensions. Method: We propose ConflictScope—the first automated evaluation framework for value conflicts—leveraging LLMs to generate multidimensional conflict scenarios and user instructions, and employing both multiple-choice and open-ended responses to systematically quantify models’ value trade-off behaviors. Results: Experiments reveal that models consistently prioritize individual over protective values; moreover, injecting explicit value orderings into system prompts significantly modulates their ranking behavior, improving alignment performance by 14%. This work establishes the first scalable, reproducible evaluation paradigm for value conflicts, empirically validating the critical role of explicit value guidance in conflict resolution. It provides a novel benchmark and actionable intervention pathway for value-aligned LLM development.

Technology Category

Application Category

📝 Abstract

Past work seeks to align large language model (LLM)-based assistants with a target set of values, but such assistants are frequently forced to make tradeoffs between values when deployed. In response to the scarcity of value conflict in existing alignment datasets, we introduce ConflictScope, an automatic pipeline to evaluate how LLMs prioritize different values. Given a user-defined value set, ConflictScope automatically generates scenarios in which a language model faces a conflict between two values sampled from the set. It then prompts target models with an LLM-written "user prompt" and evaluates their free-text responses to elicit a ranking over values in the value set. Comparing results between multiple-choice and open-ended evaluations, we find that models shift away from supporting protective values, such as harmlessness, and toward supporting personal values, such as user autonomy, in more open-ended value conflict settings. However, including detailed value orderings in models' system prompts improves alignment with a target ranking by 14%, showing that system prompting can achieve moderate success at aligning LLM behavior under value conflict. Our work demonstrates the importance of evaluating value prioritization in models and provides a foundation for future work in this area.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM value prioritization in conflict scenarios

Automatically generating value conflict test scenarios

Assessing alignment shifts between multiple-choice and open-ended responses

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline generates value conflict scenarios

Evaluates free-text responses to elicit value rankings

System prompting improves alignment with target values

🔎 Similar Papers

No similar papers found.