Post-training makes large language models less human-like

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

197K/year
🤖 AI Summary
The extent to which large language models accurately simulate human behavior remains unclear. This work introduces the Psych-201 dataset to enable the first large-scale, quantitative evaluation of how post-training affects alignment between model outputs and human behavior. Through comparative experiments across model families and scales, the study finds that post-training generally degrades models’ ability to fit human behavioral patterns—a misalignment that is further exacerbated in newer-generation models. Additionally, persona-induction techniques fail to consistently improve behavioral prediction accuracy at the individual level. These findings establish a new benchmark for evaluating behavioral alignment and provide empirical evidence critical for guiding future model development toward more human-like behavior.
📝 Abstract
Large language models (LLMs) are increasingly used as surrogates for human participants, but it remains unclear which models best capture human behavior and why. To address this, we introduce Psych-201, a novel dataset that enables us to measure behavioral alignment at scale. We find that post-training -- the stage that turns base models into useful assistants -- consistently reduces alignment with human behavior across model families, sizes, and objectives. Moreover, this misalignment widens in newer model generations even as base models continue to improve. Finally, we find that persona-induction -- a popular technique for eliciting human-like behavior by conditioning models on participant-specific information -- does not improve predictions at the level of individuals. Taken together, our results suggest that the very processes that are currently employed to turn LLMs into useful assistants also make them less accurate models of human behavior.
Problem

Research questions and friction points this paper is trying to address.

large language models
post-training
human behavior
behavioral alignment
persona-induction
Innovation

Methods, ideas, or system contributions that make the work stand out.

post-training
behavioral alignment
large language models
Psych-201
persona-induction
Marcel Binz
Marcel Binz
Helmholtz Munich
cognitive sciencemachine learninglarge language modelsautomated sciencein-context learning
Elif Akata
Elif Akata
Helmholtz Munich, University of Tübingen
machine learningcognitive science
Abdullah Almaatouq
Abdullah Almaatouq
Massachusetts Institute of Technology
Mohammed Alsobay
Mohammed Alsobay
Microsoft Research
computational social sciencedigital experimentationhuman-AI interaction
O
Oleksii Ariasov
Franziska Brändle
Franziska Brändle
University of Oxford
Cognitive ScienceDecision MakingExplorationIntrinsic Motivation
David Broska
David Broska
Stanford University
Computational Social ScienceEconomic SociologySocial InequalitySocial Psychology
Jason W. Burton
Jason W. Burton
University of Copenhagen
computational social sciencecognitive sciencedecision makingcollective intelligence
N
Nuno Busch
Frederick Callaway
Frederick Callaway
Postdoc in Psychology, NYU & Harvard
Cognitive Science
V
Vanessa Cheung
Brian Christian
Brian Christian
University of Oxford
Artificial IntelligenceMachine LearningCognitive ScienceComputational Neuroscience
Julian Coda-Forno
Julian Coda-Forno
ELLIS, Helmholtz/TUM
LLMsCognitive ScienceMeta-learningDeep LearningReinforcement Learning
Can Demircan
Can Demircan
Helmholtz Munich
machine learningcognitive science
V
Vittoria Dentella
M
Maria K. Eckstein
Noémi Éltető
Noémi Éltető
Google DeepMind
AI for sciencecognitive science
Michael Franke
Michael Franke
University of Tübingen
Pragmatics (FormalExperimental & Computational)Probabilistic ModelingLanguage EvolutionPsycholinguistics
Thomas L. Griffiths
Thomas L. Griffiths
Professor of Psychology and Computer Science, Princeton University
Computational Models of CognitionCognitive ScienceMachine LearningCognitive PsychologyBayesian Statistics
Fritz Günther
Fritz Günther
Department of Psychology, Humboldt-Universität zu Berlin
semantic memorylanguage modelsconceptual combinationform-meaning mappingvision models
S
Susanne Haridi
Sebastian Hellmann
Sebastian Hellmann
IInstitut für Angewandte Informatik e.V.
Knowledge Engineering
S
Stefan Herytash
L
Linus Hof
E
Eleanor Holton