🤖 AI Summary
This work addresses the inconsistency and lack of interpretability in large language models’ (LLMs) personality trait maintenance during binary dialogues. We conduct the first systematic evaluation of LLMs’ stability in aligning with the OCEAN five-factor personality model. To this end, we propose a dual-agent collaborative dialogue generation framework, augmented by multiple裁判 agents that perform personality reverse-engineering and quantify consistency, thereby establishing a reproducible paradigm for predictive consistency analysis. Experimental results show that while prompt engineering enables rudimentary persona generation, sustained personality coherence is highly sensitive to model selection and dialogue configuration. Cross-model discriminative consistency is low, and high–low trait combinations exhibit pronounced performance imbalance. Our findings expose structural fragility in current personality alignment approaches, providing both methodological foundations and empirical evidence for designing trustworthy conversational agents.
📝 Abstract
Large Language Models (LLMs) are widely used as conversational agents, exploiting their capabilities in various sectors such as education, law, medicine, and more. However, LLMs are often subjected to context-shifting behaviour, resulting in a lack of consistent and interpretable personality-aligned interactions. Adherence to psychological traits lacks comprehensive analysis, especially in the case of dyadic (pairwise) conversations. We examine this challenge from two viewpoints, initially using two conversation agents to generate a discourse on a certain topic with an assigned personality from the OCEAN framework (Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism) as High/Low for each trait. This is followed by using multiple judge agents to infer the original traits assigned to explore prediction consistency, inter-model agreement, and alignment with the assigned personality. Our findings indicate that while LLMs can be guided toward personality-driven dialogue, their ability to maintain personality traits varies significantly depending on the combination of models and discourse settings. These inconsistencies emphasise the challenges in achieving stable and interpretable personality-aligned interactions in LLMs.