Mind the Gap: How Elicitation Protocols Shape the Stated-Revealed Preference Gap in Language Models

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This study addresses the limitations of current language model preference evaluations, which predominantly rely on forced-choice binary prompts that may conflate genuine preferences with artifacts induced by the elicitation protocol. The authors systematically examine consistency between stated preferences and revealed choices (SvR) across 24 language models under varying elicitation protocols, introducing neutral and abstention options to capture weak or uncertain preferences, alongside system prompt interventions. Using the AIRiskDilemmas dataset, they conduct multi-model comparisons and Spearman rank correlation analyses. Results show that permitting neutral stated preferences significantly enhances SvR correlation, whereas allowing abstention during preference revelation reduces correlation to near-zero or even negative levels. System prompts fail to consistently improve alignment. These findings underscore the high sensitivity of SvR consistency to elicitation design, challenging the conventional forced-binary paradigm and advocating for evaluation frameworks that accommodate preference uncertainty.

Technology Category

Application Category

📝 Abstract

Recent work identifies a stated-revealed (SvR) preference gap in language models (LMs): a mismatch between the values models endorse and the choices they make in context. Existing evaluations rely heavily on binary forced-choice prompting, which entangles genuine preferences with artifacts of the elicitation protocol. We systematically study how elicitation protocols affect SvR correlation across 24 LMs. Allowing neutrality and abstention during stated preference elicitation allows us to exclude weak signals, substantially improving Spearman's rank correlation ($ρ$) between volunteered stated preferences and forced-choice revealed preferences. However, further allowing abstention in revealed preferences drives $ρ$ to near-zero or negative values due to high neutrality rates. Finally, we find that system prompt steering using stated preferences during revealed preference elicitation does not reliably improve SvR correlation on AIRiskDilemmas. Together, our results show that SvR correlation is highly protocol-dependent and that preference elicitation requires methods that account for indeterminate preferences.

Problem

Research questions and friction points this paper is trying to address.

stated-revealed preference gap

elicitation protocols

language models

preference elicitation

indeterminate preferences

Innovation

Methods, ideas, or system contributions that make the work stand out.

elicitation protocols

stated-revealed preference gap

language models