Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

πŸ“… 2026-05-11
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

220K/year
πŸ€– AI Summary
In multi-turn dialogues, standard residual stream activation interventions suffer from degraded behavioral control and reduced coherence due to KV cache contamination. This work proposes Gated Clipped Attention Differencing (GCAD), a method that extracts the contribution of system prompts to self-attention as an intervention signal and incorporates a token-level gating mechanism to align the intervention pathway with the model’s intrinsic prompt-dependent regulation. This alignment effectively mitigates KV cache pollution. Experimental results demonstrate that GCAD substantially enhances long-term consistency: average coherence drift improves from βˆ’18.6 to βˆ’1.9, and the trait expression rate at turn 10 increases from 78.0% to 93.1%.
πŸ“ Abstract
Activation steering controls language model behavior by adding directions to internal representations at inference time, but standard residual-stream steering can fail in stateful dialogue. We identify KV-cache contamination as a key failure mode: steered token states are stored and repeatedly reused, turning a local perturbation into cumulative coherence degradation. To address this challenge, we propose Gated Cropped Attention-Delta steering (GCAD), which extracts steering signals from system-prompt contributions to self-attention and applies them with token-level gating. Across persona-steering experiments, GCAD preserves trait control while substantially improving long-horizon coherence. On the main multi-turn benchmark, GCAD improves average coherence drift from -18.6 to -1.9 and raises turn-10 trait expression from 78.0 to 93.1. These results suggest that activation steering becomes more reliable when interventions follow the prompt-mediated pathways that models already use for behavioral control.
Problem

Research questions and friction points this paper is trying to address.

activation steering
KV-cache contamination
coherence degradation
stateful dialogue
language model behavior
Innovation

Methods, ideas, or system contributions that make the work stand out.

activation steering
KV-cache contamination
attention-level intervention
Gated Cropped Attention-Delta
prompt-mediated control
πŸ”Ž Similar Papers
No similar papers found.