🤖 AI Summary
Small language models (SLMs) inherently lack chain-of-thought (CoT) reasoning capabilities, and this limitation cannot be effectively addressed via fine-tuning or complex prompt engineering.
Method: We propose Cache Guidance—a lightweight, one-time intervention that encodes CoT trajectories generated by GPT-4o into guidance vectors, which are injected solely into the key-value (KV) caches of intermediate transformer layers; auxiliary activation-layer guidance implicitly steers structured reasoning without modifying prompts or model parameters.
Contribution/Results: This is the first method to achieve precise, single-step KV-cache intervention via guidance vectors—requiring no fine-tuning and preserving original prompts. Compared to continuous intervention strategies, it exhibits greater hyperparameter stability, lower inference overhead, and simpler deployment. On multiple multi-step reasoning benchmarks, Cache Guidance significantly improves both reasoning-path structuring and task accuracy, demonstrating strong effectiveness, generalizability, and practicality.
📝 Abstract
We propose cache steering, a lightweight method for implicit steering of language models via a one-shot intervention applied directly to the key-value cache. To validate its effectiveness, we apply cache steering to induce chain-of-thought reasoning in small language models. Our approach leverages GPT-4o-generated reasoning traces to construct steering vectors that shift model behavior toward more explicit, multi-step reasoning without fine-tuning or prompt modifications. Experimental evaluations on diverse reasoning benchmarks demonstrate that cache steering improves both the qualitative structure of model reasoning and quantitative task performance. Compared to prior activation steering techniques that require continuous interventions, our one-shot cache steering offers substantial advantages in terms of hyperparameter stability, inference-time efficiency, and ease of integration, making it a more robust and practical solution for controlled generation.