🤖 AI Summary
This work addresses the limited efficacy of conventional persona prompting in guiding large language models (LLMs) toward domain-specific expertise. We propose a learnable and intervenable persona vector mechanism: 29 domain-specific persona vectors are extracted from intermediate-layer activations, and model internal representations are directly modulated via two complementary strategies—activation addition and directional ablation—to selectively enhance domain-relevant capabilities while suppressing irrelevant ones. Unlike external prompting, this internal representation manipulation paradigm requires no input text modification, yielding consistent performance gains across multiple domain benchmarks (+3.2% average improvement) with negligible impact on out-of-domain tasks (Δ < 0.4%). To our knowledge, this is the first work to formulate persona modeling as a differentiable, intervenable vector space and empirically validate its cross-domain generalization. The approach establishes a novel paradigm for controllable reasoning through fine-grained, gradient-based internal representation steering.
📝 Abstract
The influence of personas on Large Language Models (LLMs) has been widely studied, yet their direct impact on performance remains uncertain. This work explores a novel approach to guiding LLM behaviour through role vectors, an alternative to persona-based prompting. We construct 29 role vectors derived from model activations and evaluate their impact on benchmark performance across multiple domains. Our analysis investigates whether these vectors can effectively steer models toward domain-specific expertise. We measure two key interventions: (i) activation addition, which reinforces role-specific directions, and (ii) directional ablation, which removes them. Results on well-established benchmarks indicate that role vectors do, in fact, influence model behaviour, improving task performance in relevant domains while marginally affecting unrelated tasks. This, in turn, suggests that manipulating internal model representations has a greater impact on outcomes than persona-based prompting.