🤖 AI Summary
This work addresses a key limitation of traditional retrieval-augmented generation (RAG) systems, which treat retrieved documents as static evidence and lack context-aware dynamic adaptation. The study establishes, for the first time, a theoretical connection between RAG and in-context optimization, framing the process as a single gradient descent step toward a unified objective. Building on this insight, the authors propose a lightweight feedforward reweighting strategy grounded in linear self-attention, enabling context-conditioned evidence reweighting without updating the frozen large language model (LLM). Evaluated across seven question-answering benchmarks, two distinct retrievers, and two frozen LLMs, the method consistently outperforms strong baselines, demonstrates robust cross-task transferability, and achieves performance comparable to test-time gradient-based adaptation while substantially reducing computational overhead.
📝 Abstract
In-context learning has recently been linked to implicit gradient descent in linear self-attention models, suggesting that context can induce a forward-pass update. Retrieval-augmented generation (RAG) also relies on context, but retrieved documents are usually treated as static evidence rather than signals for adaptation. We study RAG as an in-context optimization process. First, we show that one linear self-attention layer can implement one gradient-descent step on a unified linearized RAG objective covering both projection-based and dot-product retrieval interfaces. This gives an exact regime where retrieval-augmented prediction and in-context optimization coincide. We use this result not as a literal model of LLM computation, but as a guide for adapting the interaction between queries and retrieved evidence. We then test the boundary of this correspondence: it remains stable under controlled linear extensions, but becomes feature-distribution dependent under nonlinear architectures. Finally, we turn this view into a lightweight method for frozen RAG LLMs. The method keeps the retriever and backbone fixed, and predicts a context-conditioned update to a generator-side evidence-use interface. Across seven QA benchmarks, two retrievers, and two frozen LLM backbones, this forward-only update improves a shared-interface baseline, transfers to held-out tasks, and approaches test-time gradient adaptation at much lower per-query cost.