🤖 AI Summary
This study addresses the overlooked influence of human-like personality diversity on reasoning capabilities in large language models, which are typically optimized for uniform performance metrics. Through unsupervised continual pretraining on domain-specific corpora to simulate experiential accumulation, the authors quantify model personalities using the Big Five framework and analyze linguistic features—such as imperative frequency and lexical diversity—to investigate the relationship between personality traits and reasoning behavior. The work reveals, for the first time, a bimodal distribution of model capabilities and introduces the “suppression advantage” phenomenon, identifying two high-performing archetypes: “expressive generalists” and “suppressed specialists.” It demonstrates that reduced extraversion enhances complex reasoning performance and establishes a causal link between linguistic characteristics in training data and emergent model personality, thereby offering a viable pathway toward deliberate “personality engineering.”
📝 Abstract
Human problem-solving is enriched by a diversity of styles and personality traits, yet the development of Large Language Models (LLMs) has largely prioritized uniform performance benchmarks that favour specific behavioural tendencies such as assertiveness. To investigate how diverse experiences shape machine personality and influence problem-solving, this study employs continued pre-training to expose models to domain-specific texts in an unsupervised manner, simulating the accumulation of experience. By adapting the Big Five framework via the Machine Personality Inventory (MPI), we quantify the personality traits of these model variants and analyse their relationship to linguistic style and reasoning behaviour. The findings reveal that model competence is bimodal, peaking at"Expressive Generalists"and"Suppressed Specialists,"while identifying a"Suppression Advantage"where reduced social traits enhance complex reasoning performance. This study further establishes a causal link between training data linguistics, such as imperative frequency, and lexical diversity, providing a roadmap for"Personality Engineering".