Can LLMs Truly Embody Human Personality? Analyzing AI and Human Behavior Alignment in Dispute Resolution

πŸ“… 2026-02-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This study investigates whether large language models (LLMs) can authentically replicate human behavioral patterns in conflict resolution as shaped by personality traits. To this end, we propose the first interpretable evaluation framework for aligning AI behavior with human behavior, grounded in the Big Five personality model. We construct a dialogue dataset that pairs specific personality profiles with conflict scenarios and employ quantifiable behavioral metrics to compare LLMs’ strategic choices and conflict outcomes against those of humans. Experimental results reveal significant discrepancies between current mainstream LLMs and human behavior in personality-driven conflict interactions, raising critical concerns about the reliability of these models as behavioral proxies in social applications.

Technology Category

Application Category

πŸ“ Abstract
Large language models (LLMs) are increasingly used to simulate human behavior in social settings such as legal mediation, negotiation, and dispute resolution. However, it remains unclear whether these simulations reproduce the personality-behavior patterns observed in humans. Human personality, for instance, shapes how individuals navigate social interactions, including strategic choices and behaviors in emotionally charged interactions. This raises the question: Can LLMs, when prompted with personality traits, reproduce personality-driven differences in human conflict behavior? To explore this, we introduce an evaluation framework that enables direct comparison of human-human and LLM-LLM behaviors in dispute resolution dialogues with respect to Big Five Inventory (BFI) personality traits. This framework provides a set of interpretable metrics related to strategic behavior and conflict outcomes. We additionally contribute a novel dataset creation methodology for LLM dispute resolution dialogues with matched scenarios and personality traits with respect to human conversations. Finally, we demonstrate the use of our evaluation framework with three contemporary closed-source LLMs and show significant divergences in how personality manifests in conflict across different LLMs compared to human data, challenging the assumption that personality-prompted agents can serve as reliable behavioral proxies in socially impactful applications. Our work highlights the need for psychological grounding and validation in AI simulations before real-world use.
Problem

Research questions and friction points this paper is trying to address.

personality-behavior alignment
large language models
dispute resolution
Big Five personality
human-AI behavioral simulation
Innovation

Methods, ideas, or system contributions that make the work stand out.

personality-behavior alignment
dispute resolution
large language models
evaluation framework
Big Five Inventory
πŸ”Ž Similar Papers
No similar papers found.