Evaluating Alignment of Behavioral Dispositions in LLMs

📅 2026-02-11

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This study addresses the lack of systematic evaluation regarding whether large language models (LLMs) exhibit behavioral tendencies aligned with humans in social contexts. The authors propose an innovative approach that transforms psychological self-report scales into 2,500 human-validated situational judgment tests (SJTs) to assess behavioral alignment across 25 LLMs in realistic user-assistant interactions. Leveraging large-scale preference annotations from 550 participants, the findings reveal that models frequently encourage emotional expression in situations where restraint is normative; smaller models exhibit substantial deviations under high human consensus, while state-of-the-art models still show 15–20% misalignment; furthermore, models tend to be overconfident in low-consensus scenarios. This work is the first to uncover a systematic gap between LLMs’ stated values and their actual behaviors, offering a scalable, psychology-driven framework for alignment evaluation.

Technology Category

Application Category

📝 Abstract

As LLMs integrate into our daily lives, understanding their behavior becomes essential. In this work, we focus on behavioral dispositions$-$the underlying tendencies that shape responses in social contexts$-$and introduce a framework to study how closely the dispositions expressed by LLMs align with those of humans. Our approach is grounded in established psychological questionnaires but adapts them for LLMs by transforming human self-report statements into Situational Judgment Tests (SJTs). These SJTs assess behavior by eliciting natural recommendations in realistic user-assistant scenarios. We generate 2,500 SJTs, each validated by three human annotators, and collect preferred actions from 10 annotators per SJT, from a large pool of 550 participants. In a comprehensive study involving 25 LLMs, we find that models often do not reflect the distribution of human preferences: (1) in scenarios with low human consensus, LLMs consistently exhibit overconfidence in a single response; (2) when human consensus is high, smaller models deviate significantly, and even some frontier models do not reflect the consensus in 15-20% of cases; (3) traits can exhibit cross-LLM patterns, e.g., LLMs may encourage emotion expression in contexts where human consensus favors composure. Lastly, mapping psychometric statements directly to behavioral scenarios presents a unique opportunity to evaluate the predictive validity of self-reports, revealing considerable gaps between LLMs'stated values and their revealed behavior.

Problem

Research questions and friction points this paper is trying to address.

behavioral alignment

large language models

situational judgment tests

human preferences

behavioral dispositions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Behavioral Alignment

Situational Judgment Tests

Large Language Models