Adaptive Retrieval helps Reasoning in LLMs -- but mostly if it's not used

📅 2026-02-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models are prone to hallucinations in complex mathematical reasoning due to reliance on static internal knowledge. This work proposes an adaptive retrieval-augmented architecture that enables the model to actively decide, during inference, whether to consult an external knowledge base, treating retrieval as a dynamic form of in-context learning. The study reveals that the model’s decision not to retrieve serves as a strong metacognitive signal of high performance, with retrieval providing significant benefits only in specific scenarios—such as when citing critical theorems. Evaluated on GSM8K and MATH-500 benchmarks, the approach combined with chain-of-thought (CoT) reasoning outperforms standard CoT even when no retrieval occurs, and dynamically adjusts retrieval frequency based on problem difficulty. These findings underscore the crucial role of self-assessment and selective retrieval in enhancing reasoning robustness.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) often falter in complex reasoning tasks due to their static, parametric knowledge, leading to hallucinations and poor performance in specialized domains like mathematics. This work explores a fundamental principle for enhancing generative models: treating retrieval as a form of dynamic in-context learning. We test an adaptive retrieval-augmented architecture where an LLM agent actively decides when to query an external knowledge base during its reasoning process. We compare this adaptive strategy against a standard Chain-of-Thought (CoT) baseline and a static retrieval approach on the GSM8K and MATH-500 benchmarks. Although our experiments show that static retrieval is inferior to CoT, the adaptive retrieval shows interesting behavior: While traces including retrieved results show slightly worse performance compared to CoT, traces that do not include retrieval actually perform better compared to CoT. This suggests that: (a) retrieval only rarely helps reasoning (we show a few counterexamples, e.g. using useful theorems) and (b) actively not using retrieval is indicative of good model performance. Furthermore, we find that the model scales its retrieval frequency with the difficulty of the problem, reinforcing that the decision to retrieve is a crucial metacognitive signal. The agent's ability to self-assess its knowledge and selectively engage with external information represents a key principle for building more robust and reliable generative models.
Problem

Research questions and friction points this paper is trying to address.

reasoning
retrieval
large language models
hallucination
metacognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive retrieval
reasoning in LLMs
metacognitive signal
dynamic in-context learning
retrieval-augmented generation
🔎 Similar Papers
No similar papers found.
S
Srijan Shakya
Institute of Machine Learning, Johannes Kepler University Linz, Austria; Pro2future GmbH, Linz, Austria
A
Anamaria-Roberta Hartl
Institute of Machine Learning, Johannes Kepler University Linz, Austria
Sepp Hochreiter
Sepp Hochreiter
Institute for Machine Learning, Johannes Kepler University Linz
Machine LearningDeep LearningArtificial IntelligenceNeural NetworksBioinformatics
Korbinian Pöppel
Korbinian Pöppel
ELLIS Unit Linz, Johannes-Kepler University Linz
Artificial IntelligenceMachine LearningPhysics