The Geometry of Persona: Disentangling Personality from Reasoning in Large Language Models

📅 2025-12-07

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Personalized large language models face a fundamental stability–plasticity trade-off: existing alignment methods (e.g., supervised fine-tuning) impose an “alignment tax,” degrading general reasoning capabilities. To address this, we propose Soul Engine—a novel framework that geometrically models personality as a linear, orthogonal subspace to the reasoning subspace within a frozen backbone model, thereby decoupling personality from core reasoning ability. Our approach employs a dual-head architecture, a dynamically context-sampled benchmark dataset (SoulBench), and vector-arithmetic-based personality modulation—enabling zero-shot personality injection and deterministic behavioral control. Experiments demonstrate high-fidelity personality modeling (MSE = 0.011) and manifold-level orthogonality and continuity, verified via T-SNE. Crucially, Soul Engine achieves controllable, high-fidelity personality customization without backbone fine-tuning, preserving full reasoning capability—eliminating the alignment tax while enabling lossless, interpretable personalization.

Technology Category

Application Category

📝 Abstract

Background: The deployment of personalized Large Language Models (LLMs) is currently constrained by the stability-plasticity dilemma. Prevailing alignment methods, such as Supervised Fine-Tuning (SFT), rely on stochastic weight updates that often incur an "alignment tax" -- degrading general reasoning capabilities. Methods: We propose the Soul Engine, a framework based on the Linear Representation Hypothesis, which posits that personality traits exist as orthogonal linear subspaces. We introduce SoulBench, a dataset constructed via dynamic contextual sampling. Using a dual-head architecture on a frozen Qwen-2.5 base, we extract disentangled personality vectors without modifying the backbone weights. Results: Our experiments demonstrate three breakthroughs. First, High-Precision Profiling: The model achieves a Mean Squared Error (MSE) of 0.011 against psychological ground truth. Second, Geometric Orthogonality: T-SNE visualization confirms that personality manifolds are distinct and continuous, allowing for "Zero-Shot Personality Injection" that maintains original model intelligence. Third, Deterministic Steering: We achieve robust control over behavior via vector arithmetic, validated through extensive ablation studies. Conclusion: This work challenges the necessity of fine-tuning for personalization. By transitioning from probabilistic prompting to deterministic latent intervention, we provide a mathematically rigorous foundation for safe, controllable AI personalization.

Problem

Research questions and friction points this paper is trying to address.

Disentangling personality traits from reasoning capabilities in LLMs.

Avoiding degradation of general reasoning during model personalization.

Enabling deterministic control over AI behavior without fine-tuning.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extract orthogonal personality vectors without weight updates

Use dual-head architecture on frozen base model

Achieve deterministic personality injection via vector arithmetic

🔎 Similar Papers

No similar papers found.