Stories of Your Life as Others: A Round-Trip Evaluation of LLM-Generated Life Stories Conditioned on Rich Psychometric Profiles

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses a critical gap in personality-conditioned generation with large language models (LLMs), which typically rely on self-report questionnaires without validation against real psychometric data. Leveraging empirically measured personality profiles from 290 participants, the authors prompted ten LLMs to generate first-person life narratives and employed three independent scoring models to infer personality traits solely from these texts, establishing a closed-loop “generation–reconstruction” evaluation framework. This approach, for the first time, uses authentic human psychological data both to drive and validate LLM personality expression. Results demonstrate that generated narratives not only reflect individual differences but also reproduce emotion response patterns observed in real human dialogue. Personality reconstruction achieved a correlation of 0.750—equivalent to 85% of human test–retest reliability—with nine out of ten behavioral traits showing significant alignment with ground-truth data, surpassing the limitations of superficial alignment methods.
📝 Abstract
Personality traits are richly encoded in natural language, and large language models (LLMs) trained on human text can simulate personality when conditioned on persona descriptions. However, existing evaluations rely predominantly on questionnaire self-report by the conditioned model, are limited in architectural diversity, and rarely use real human psychometric data. Without addressing these limitations, it remains unclear whether personality conditioning produces psychometrically informative representations of individual differences or merely superficial alignment with trait descriptors. To test how robustly LLMs can encode personality into extended text, we condition LLMs on real psychometric profiles from 290 participants to generate first-person life story narratives, and then task independent LLMs to recover personality scores from those narratives alone. We show that personality scores can be recovered from the generated narratives at levels approaching human test-retest reliability (mean r = 0.750, 85% of the human ceiling), and that recovery is robust across 10 LLM narrative generators and 3 LLM personality scorers spanning 6 providers. Decomposing systematic biases reveals that scoring models achieve their accuracy while counteracting alignment-induced defaults. Content analysis of the generated narratives shows that personality conditioning produces behaviourally differentiated text: nine of ten coded features correlate significantly with the same features in participants' real conversations, and personality-driven emotional reactivity patterns in narratives replicate in real conversational data. These findings provide evidence that the personality-language relationship captured during pretraining supports robust encoding and decoding of individual differences, including characteristic emotional variability patterns that replicate in real human behaviour.
Problem

Research questions and friction points this paper is trying to address.

personality conditioning
psychometric profiles
large language models
individual differences
narrative generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

round-trip evaluation
psychometric profiles
personality-conditioned generation
LLM personality decoding
behavioral differentiation
🔎 Similar Papers
No similar papers found.