The Effectiveness of Style Vectors for Steering Large Language Models: A Human Evaluation

📅 2026-01-29
🏛️ IEEE Access
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of effectively controlling the emotional output of large language models during inference. The authors propose an activation steering approach that modulates sentiment without requiring prompt engineering or fine-tuning, by adjusting the intensity of a style vector (λ≈0.15). They introduce the first large-scale human evaluation benchmark comprising over 7,000 samples, collected via the Prolific platform, to assess efficacy across multiple affective dimensions including disgust and fear. Results demonstrate that moderate intervention significantly enhances target emotions—disgust (η²=0.616) and fear (η²=0.540)—while preserving text fluency. LLaMA-3 exhibits greater stability than Alpaca, achieving statistical significance (p<0.001) across all dimensions, with high inter-rater reliability (ICC=0.71–0.87). The study further reveals strong alignment between automated metrics and human-perceived quality.

Technology Category

Application Category

📝 Abstract
Controlling the behavior of large language models (LLMs) at inference time is essential for aligning outputs with human abilities and safety requirements. Activation steering provides a lightweight alternative to prompt engineering and fine-tuning by directly modifying internal activations to guide generation. This research advances the literature in three significant directions. First, while previous work demonstrated the technical feasibility of steering emotional tone using automated classifiers, this paper presents the first human evaluation of activation steering concerning the emotional tone of LLM outputs, collecting over 7,000 crowd-sourced ratings from 190 participants via Prolific (<inline-formula> <tex-math notation="LaTeX">$n=190$ </tex-math></inline-formula>). These ratings assess both perceived emotional intensity and overall text quality. Second, we find strong alignment between human and model-based quality ratings (mean <inline-formula> <tex-math notation="LaTeX">$r=0.776$ </tex-math></inline-formula>, range 0.157–0.985), indicating automatic scoring can proxy perceived quality. Moderate steering strengths (<inline-formula> <tex-math notation="LaTeX">$\lambda \approx 0.15$ </tex-math></inline-formula>) reliably amplify target emotions while preserving comprehensibility, with the strongest effects for disgust (<inline-formula> <tex-math notation="LaTeX">$\eta _{p}^{2} = 0.616$ </tex-math></inline-formula>) and fear (<inline-formula> <tex-math notation="LaTeX">$\eta _{p}^{2} = 0.540$ </tex-math></inline-formula>), and minimal effects for surprise (<inline-formula> <tex-math notation="LaTeX">$\eta _{p}^{2} = 0.042$ </tex-math></inline-formula>). Finally, upgrading from Alpaca to LlaMA-3 yielded more consistent steering with significant effects across emotions and strengths (all <inline-formula> <tex-math notation="LaTeX">$p \lt 0.001$ </tex-math></inline-formula>). Inter-rater reliability was high (ICC = 0.71–0.87), underscoring the robustness of the findings. These findings support activation-based control as a scalable method for steering LLM behavior across affective dimensions.
Problem

Research questions and friction points this paper is trying to address.

large language models
activation steering
emotional tone
human evaluation
behavior control
Innovation

Methods, ideas, or system contributions that make the work stand out.

activation steering
human evaluation
emotional tone control
large language models
affective alignment
🔎 Similar Papers
No similar papers found.
D
Diaoulé Diallo
German Aerospace Center (DLR), Institute of Software Technology, Germany
K
Katharina Dworatzyk
German Aerospace Center (DLR), Institute of Software Technology, Germany
S
Sophie Jentzsch
German Aerospace Center (DLR), Institute of Software Technology, Germany
P
Peer Schütt
German Aerospace Center (DLR), Institute of Software Technology, Germany
Sabine Theis
Sabine Theis
Group Lead at German Aerospace Centre (DLR)
Human Factors in Software EngineeringInformation VisualizationHCIErgonomicsHealth Informatics
T
Tobias Hecking
German Aerospace Center (DLR), Institute of Software Technology, Germany