In-Context Optimization for Retrieval-Augmented Generation: A Gradient-Descent Perspective

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

This work addresses a key limitation of traditional retrieval-augmented generation (RAG) systems, which treat retrieved documents as static evidence and lack context-aware dynamic adaptation. The study establishes, for the first time, a theoretical connection between RAG and in-context optimization, framing the process as a single gradient descent step toward a unified objective. Building on this insight, the authors propose a lightweight feedforward reweighting strategy grounded in linear self-attention, enabling context-conditioned evidence reweighting without updating the frozen large language model (LLM). Evaluated across seven question-answering benchmarks, two distinct retrievers, and two frozen LLMs, the method consistently outperforms strong baselines, demonstrates robust cross-task transferability, and achieves performance comparable to test-time gradient-based adaptation while substantially reducing computational overhead.

📝 Abstract

In-context learning has recently been linked to implicit gradient descent in linear self-attention models, suggesting that context can induce a forward-pass update. Retrieval-augmented generation (RAG) also relies on context, but retrieved documents are usually treated as static evidence rather than signals for adaptation. We study RAG as an in-context optimization process. First, we show that one linear self-attention layer can implement one gradient-descent step on a unified linearized RAG objective covering both projection-based and dot-product retrieval interfaces. This gives an exact regime where retrieval-augmented prediction and in-context optimization coincide. We use this result not as a literal model of LLM computation, but as a guide for adapting the interaction between queries and retrieved evidence. We then test the boundary of this correspondence: it remains stable under controlled linear extensions, but becomes feature-distribution dependent under nonlinear architectures. Finally, we turn this view into a lightweight method for frozen RAG LLMs. The method keeps the retriever and backbone fixed, and predicts a context-conditioned update to a generator-side evidence-use interface. Across seven QA benchmarks, two retrievers, and two frozen LLM backbones, this forward-only update improves a shared-interface baseline, transfers to held-out tasks, and approaches test-time gradient adaptation at much lower per-query cost.

Problem

Research questions and friction points this paper is trying to address.

Retrieval-Augmented Generation

In-Context Learning

Gradient Descent

Context Adaptation

Frozen LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context optimization

retrieval-augmented generation

gradient descent