🤖 AI Summary
This study investigates the consistency of value preferences exhibited by large language models (LLMs) across short-answer and long-form text generation. We propose a psychometrically grounded evaluation framework that integrates validated value scales with controllable long-text generation, enabling cross-length and cross-argument-number analysis of five state-of-the-art LLMs. Using correlation analysis and multidimensional regression modeling, we find—firstly—that value preferences in short versus long responses exhibit only weak correlation (mean *r* < 0.3); secondly, that value alignment interventions yield only marginal improvements in consistency; thirdly, that preference strength diminishes with increasing argument specificity; and fourthly, that models with stronger cross-scenario representational generalization exhibit more stable value preferences. These results identify generation length, argument specificity, and representation generalizability as critical determinants of value expression stability—offering novel analytical dimensions and empirical foundations for LLM value assessment and alignment.
📝 Abstract
Evaluations of LLMs' ethical risks and value inclinations often rely on short-form surveys and psychometric tests, yet real-world use involves long-form, open-ended responses -- leaving value-related risks and preferences in practical settings largely underexplored. In this work, we ask: Do value preferences inferred from short-form tests align with those expressed in long-form outputs? To address this question, we compare value preferences elicited from short-form reactions and long-form responses, varying the number of arguments in the latter to capture users' differing verbosity preferences. Analyzing five LLMs (llama3-8b, gemma2-9b, mistral-7b, qwen2-7b, and olmo-7b), we find (1) a weak correlation between value preferences inferred from short-form and long-form responses across varying argument counts, and (2) similarly weak correlation between preferences derived from any two distinct long-form generation settings. (3) Alignment yields only modest gains in the consistency of value expression. Further, we examine how long-form generation attributes relate to value preferences, finding that argument specificity negatively correlates with preference strength, while representation across scenarios shows a positive correlation. Our findings underscore the need for more robust methods to ensure consistent value expression across diverse applications.