🤖 AI Summary
This study investigates the internal mechanisms underlying reasoning in large language models, challenging the prevailing assumption that chain-of-thought (CoT) prompting is necessary to elicit reasoning capabilities. By applying sparse autoencoders (SAEs) to dissect model representations, the authors identify a small set of latent features causally linked to reasoning behavior. Leveraging latent steering techniques, they directly activate these features without relying on CoT prompts. Experiments across multiple models and reasoning benchmarks demonstrate that manipulating just a single such feature can substantially improve reasoning accuracy—matching or even surpassing the performance of standard CoT prompting—while yielding more efficient generations. This work provides the first evidence that large language models harbor reasoning-oriented computational pathways that can be externally activated, establishing CoT as one effective—but not exclusive—means of engaging this intrinsic reasoning mechanism.
📝 Abstract
Chain-of-Thought (CoT) prompting has improved the reasoning performance of large language models (LLMs), but it remains unclear why it works and whether it is the unique mechanism for triggering reasoning in large language models. In this work, we study this question by directly analyzing and intervening on the internal representations of LLMs with Sparse Autoencoders (SAEs), identifying a small set of latent features that are causally associated with LLM reasoning behavior. Across multiple model families and reasoning benchmarks, we find that steering a single reasoning-related latent feature can substantially improve accuracy without explicit CoT prompting. For large models, latent steering achieves performance comparable to standard CoT prompting while producing more efficient outputs. We further observe that this reasoning-oriented internal state is triggered early in generation and can override prompt-level instructions that discourage explicit reasoning. Overall, our results suggest that multi-step reasoning in LLMs is supported by latent internal activations that can be externally activated, while CoT prompting is one effective, but not unique, way of activating this mechanism rather than its necessary cause.