🤖 AI Summary
This work addresses the susceptibility of large language models to non-discriminative patterns in irrelevant context when guided to attend to user-specified text segments. To mitigate this, the authors propose PRISM-Δ, a method that constructs a discriminative subspace by differentially decomposing the cross-covariance matrices of positive and negative samples, enabling precise guidance toward salient fragments. PRISM-Δ further incorporates a soft attention-head weighting mechanism to preserve weak yet informative signals and, for the first time, extends prompt highlighting to Value representations to exploit content-channel information. Compatible with FlashAttention, the approach outperforms or matches the best existing methods in 19 out of 20 configurations across four benchmarks and five models, achieving up to a +10.6% relative performance gain, 50% lower guidance overhead, and up to +4.8% improvement in long-context retrieval tasks.
📝 Abstract
Prompt highlighting steers a large language model to prioritize user-specified text spans during generation. A key challenge is extracting steering directions that capture the difference between relevant and irrelevant contexts, rather than shared structural patterns common to both. We propose PRISM-$Δ$ (Projection-based Relevance-Informed Steering Method), which decomposes the difference between positive and negative cross-covariance matrices to maximize discriminative energy while eliminating shared directions. Each attention head receives a continuous softplus importance weight, letting weak-but-useful heads contribute at reduced strength. The framework extends naturally to Value representations, capturing content-channel signal that Key-only methods leave unused. Across four benchmarks and five models, PRISM-$Δ$ matches or exceeds the best existing method on 19 of 20 configurations, with relative gains up to +10.6%, while halving the fluency cost of steering. PRISM-$Δ$ also scales to long-context retrieval, outperforming the best existing method by up to +4.8% relative gain. PRISM-$Δ$ is compatible with FlashAttention and adds negligible memory overhead.