Reasoning or Retrieval? A Study of Answer Attribution on Large Reasoning Models

๐Ÿ“… 2025-09-28
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large reasoning models (LRMs) frequently exhibit answer-reasoning inconsistency during chain-of-thought (CoT) inference. We identify this as stemming from competitive coexistence between reasoning and memory retrieval mechanisms: current CoT fine-tuning is prone to being hijacked by retrieval shortcuts, causing reward signals to degenerate and genuine reasoning capabilities to deteriorate. To address this, we propose FARLโ€”a framework integrating memory forgetting with reinforcement learningโ€”to actively suppress retrieval shortcuts and steer models toward reasoning-dominant behavior. Through controlled experiments, counterfactual memory intervention, and multi-dimensional comparative analysis, we empirically validate the ubiquity of retrieval shortcuts across model scales, domains, and training paradigms. FARL significantly improves reasoning consistency and cross-task generalization. Our work establishes a novel paradigm for developing trustworthy, interpretable reasoning models.

Technology Category

Application Category

๐Ÿ“ Abstract
Large reasoning models (LRMs) exhibit unprecedented capabilities in solving complex problems through Chain-of-Thought (CoT) reasoning. However, recent studies reveal that their final answers often contradict their own reasoning traces. We hypothesize that this inconsistency stems from two competing mechanisms for generating answers: CoT reasoning and memory retrieval. To test this hypothesis, we conduct controlled experiments that challenge LRMs with misleading cues during reasoning and/or corrupted answers during retrieval. Our results across models and datasets confirm that both mechanisms operate simultaneously, with their relative dominance influenced by multiple factors: problem domains, model scales, and fine-tuning approaches (e.g., reinforcement learning vs. distillation). The findings reveal a critical limitation in current reasoning fine-tuning paradigms: models can exploit the retrieval mechanism as a shortcut, effectively "hacking" the reward signal and undermining genuine reasoning development. To address this challenge, we introduce FARL, a novel fine-tuning framework that integrates memory unlearning with reinforcement learning. By carefully suppressing retrieval shortcuts during the fine-tuning process, FARL promotes reasoning-dominant behavior and enhances generalizable reasoning capabilities.
Problem

Research questions and friction points this paper is trying to address.

Investigating contradictory reasoning and retrieval mechanisms in large models
Analyzing factors influencing dominant answer generation strategies
Developing fine-tuning framework to suppress retrieval shortcut exploitation
Innovation

Methods, ideas, or system contributions that make the work stand out.

FARL integrates memory unlearning with reinforcement learning
It suppresses retrieval shortcuts during fine-tuning process
Promotes reasoning-dominant behavior in large models
Y
Yuhui Wang
Stony Brook University
Changjiang Li
Changjiang Li
Stony Brook
Adversarial Machine LearningTrustworthy Machine LearningData-driven Security
G
Guangke Chen
Stony Brook University
Jiacheng Liang
Jiacheng Liang
Stony Brook University
LLM SecurityLLM Optimization
T
Ting Wang
Stony Brook University