๐ค AI Summary
Existing approaches to personality control in large language models rely on static prompts or costly fine-tuning, making dynamic and composable personality expression challenging. This work proposes a training-free framework that, for the first time, reveals personality traits in the modelโs activation space are approximately orthogonal and amenable to vector arithmetic. By leveraging contrastive activation analysis, the method extracts personality basis vectors and enables fine-grained control through scaling, addition, subtraction, and context-aware dynamic composition. Evaluated on PersonalityBench, the approach achieves a score of 9.60โnearly matching the fine-tuned upper bound of 9.61โand attains a 91% win rate on the Persona-Evolve benchmark, substantially outperforming existing training-free methods.
๐ Abstract
Current methods for personality control in Large Language Models rely on static prompting or expensive fine-tuning, failing to capture the dynamic and compositional nature of human traits. We introduce PERSONA, a training-free framework that achieves fine-tuning level performance through direct manipulation of personality vectors in activation space. Our key insight is that personality traits appear as extractable, approximately orthogonal directions in the model's representation space that support algebraic operations. The framework operates through three stages: Persona-Base extracts orthogonal trait vectors via contrastive activation analysis; Persona-Algebra enables precise control through vector arithmetic (scalar multiplication for intensity, addition for composition, subtraction for suppression); and Persona-Flow achieves context-aware adaptation by dynamically composing these vectors during inference. On PersonalityBench, our approach achieves a mean score of 9.60, nearly matching the supervised fine-tuning upper bound of 9.61 without any gradient updates. On our proposed Persona-Evolve benchmark for dynamic personality adaptation, we achieve up to 91% win rates across diverse model families. These results provide evidence that aspects of LLM personality are mathematically tractable, opening new directions for interpretable and efficient behavioral control.