Retrieval from Within: An Intrinsic Capability of Attention-Based Models

📅 2026-05-07
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
This work addresses the mismatch between retrieval and generation in traditional Retrieval-Augmented Generation (RAG) approaches, which treat these components as separate modules. To overcome this limitation, the authors propose INTRA, a novel framework that leverages the internal attention mechanisms of encoder-decoder models to perform query-driven implicit retrieval directly from pre-encoded evidence passages. By seamlessly integrating this retrieval process into generation, INTRA achieves end-to-end unification without requiring an external retrieval module. The method substantially reduces computational overhead by reusing encoded states. Experimental results demonstrate that INTRA consistently improves both evidence recall and answer generation quality across multiple question-answering benchmarks, outperforming strong RAG baselines.
📝 Abstract
Retrieval-augmented generation (RAG) typically treats retrieval and generation as separate systems. We ask whether an attention-based encoder-decoder can instead retrieve directly from its own internal representations. We introduce INTRA (INTrinsic Retrieval via Attention), a framework where decoder attention queries score pre-encoded evidence chunks that are then directly reused as context for generation. By construction, INTRA unifies retrieval and generation, eliminating the retriever-generator mismatch typical of RAG pipelines. This design also amortizes context encoding by reusing precomputed encoder states across queries. On question-answering benchmarks, INTRA outperforms strong engineered retrieval pipelines on both evidence recall and end-to-end answer quality. Our results demonstrate that attention-based models already possess a retrieval mechanism that can be elicited, rather than added as an external module.
Problem

Research questions and friction points this paper is trying to address.

retrieval-augmented generation
retriever-generator mismatch
attention-based models
intrinsic retrieval
internal representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

intrinsic retrieval
attention mechanism
retrieval-augmented generation
encoder-decoder architecture
context reuse
🔎 Similar Papers