🤖 AI Summary
Personalized large language models face a fundamental stability–plasticity trade-off: existing alignment methods (e.g., supervised fine-tuning) impose an “alignment tax,” degrading general reasoning capabilities. To address this, we propose Soul Engine—a novel framework that geometrically models personality as a linear, orthogonal subspace to the reasoning subspace within a frozen backbone model, thereby decoupling personality from core reasoning ability. Our approach employs a dual-head architecture, a dynamically context-sampled benchmark dataset (SoulBench), and vector-arithmetic-based personality modulation—enabling zero-shot personality injection and deterministic behavioral control. Experiments demonstrate high-fidelity personality modeling (MSE = 0.011) and manifold-level orthogonality and continuity, verified via T-SNE. Crucially, Soul Engine achieves controllable, high-fidelity personality customization without backbone fine-tuning, preserving full reasoning capability—eliminating the alignment tax while enabling lossless, interpretable personalization.
📝 Abstract
Background: The deployment of personalized Large Language Models (LLMs) is currently constrained by the stability-plasticity dilemma. Prevailing alignment methods, such as Supervised Fine-Tuning (SFT), rely on stochastic weight updates that often incur an "alignment tax" -- degrading general reasoning capabilities.
Methods: We propose the Soul Engine, a framework based on the Linear Representation Hypothesis, which posits that personality traits exist as orthogonal linear subspaces. We introduce SoulBench, a dataset constructed via dynamic contextual sampling. Using a dual-head architecture on a frozen Qwen-2.5 base, we extract disentangled personality vectors without modifying the backbone weights.
Results: Our experiments demonstrate three breakthroughs. First, High-Precision Profiling: The model achieves a Mean Squared Error (MSE) of 0.011 against psychological ground truth. Second, Geometric Orthogonality: T-SNE visualization confirms that personality manifolds are distinct and continuous, allowing for "Zero-Shot Personality Injection" that maintains original model intelligence. Third, Deterministic Steering: We achieve robust control over behavior via vector arithmetic, validated through extensive ablation studies.
Conclusion: This work challenges the necessity of fine-tuning for personalization. By transitioning from probabilistic prompting to deterministic latent intervention, we provide a mathematically rigorous foundation for safe, controllable AI personalization.