The Reasoning-Memorization Interplay in Language Models Is Mediated by a Single Direction

📅 2025-03-29

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

Large language models (LLMs) often exhibit limited generalization on reasoning tasks due to overreliance on memorized training patterns, and the dynamic interplay between reasoning and memory remains poorly understood. This work, grounded in residual stream analysis, identifies and validates— for the first time—a set of linear feature directions in the model’s representation space; notably, a single dominant direction quantitatively captures the reasoning–memory trade-off. Leveraging linear probing, causal intervention, and activation tracing, we precisely isolate and modulate this direction. Experiments demonstrate that targeted intervention significantly improves accuracy on unseen reasoning tasks, enhances task discrimination fidelity, and yields more interpretable generation trajectories. Our study is the first to uncover a low-dimensional geometric mechanism underlying reasoning–memory switching in LLMs, establishing a novel paradigm for controllable reasoning and explicit memory–reasoning decoupling.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) excel on a variety of reasoning benchmarks, but previous studies suggest they sometimes struggle to generalize to unseen questions, potentially due to over-reliance on memorized training examples. However, the precise conditions under which LLMs switch between reasoning and memorization during text generation remain unclear. In this work, we provide a mechanistic understanding of LLMs' reasoning-memorization dynamics by identifying a set of linear features in the model's residual stream that govern the balance between genuine reasoning and memory recall. These features not only distinguish reasoning tasks from memory-intensive ones but can also be manipulated to causally influence model performance on reasoning tasks. Additionally, we show that intervening in these reasoning features helps the model more accurately activate the most relevant problem-solving capabilities during answer generation. Our findings offer new insights into the underlying mechanisms of reasoning and memory in LLMs and pave the way for the development of more robust and interpretable generative AI systems.

Problem

Research questions and friction points this paper is trying to address.

Understanding when LLMs switch between reasoning and memorization

Identifying linear features controlling reasoning-memorization balance

Manipulating features to improve model problem-solving accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Identified linear features in residual stream

Manipulated features to influence reasoning performance

Intervened features to activate problem-solving capabilities

🔎 Similar Papers

Recite, Reconstruct, Recollect: Memorization in LMs as a Multifaceted Phenomenon