๐ค AI Summary
This work addresses the challenge of dynamic data selection for retrieval-augmented reasoning in state-space models. We propose RICO, the first framework to perform end-to-end retrieval weight learning using gradient feedback from the large language model (LLM) itselfโwithout external heuristics or supervised fine-tuning. RICO unifies retrieval-augmented generation (RAG), in-context optimization, and gradient-driven document weighting, optimizing retrieval quality via unsupervised perplexity minimization to reduce query-level reasoning uncertainty. Experiments show that RICO matches BM25โs performance without any fine-tuning and significantly outperforms supervised dense retrievers (e.g., E5) in final prediction accuracy. By enabling fully unsupervised, adaptive retrieval, RICO establishes a novel paradigm for dynamic, self-calibrating retrieval in LLM-based reasoning systems.
๐ Abstract
Given a query and dataset, the optimal way of answering the query is to make use all the information available. Modern LLMs exhibit impressive ability to memorize training data, but data not deemed important during training is forgotten, and information outside that training set cannot be made use of. Processing an entire dataset at inference time is infeasible due to the bounded nature of model resources (e.g. context size in transformers or states in state space models), meaning we must resort to external memory. This constraint naturally leads to the following problem: How can we decide based on the present query and model, what among a virtually unbounded set of known data matters for inference? To minimize model uncertainty for a particular query at test-time, we introduce Retrieval In-Context Optimization (RICO), a retrieval method that uses gradients from the LLM itself to learn the optimal mixture of documents for answer generation. Unlike traditional retrieval-augmented generation (RAG), which relies on external heuristics for document retrieval, our approach leverages direct feedback from the model. Theoretically, we show that standard top-$k$ retrieval with model gradients can approximate our optimization procedure, and provide connections to the leave-one-out loss. We demonstrate empirically that by minimizing an unsupervised loss objective in the form of question perplexity, we can achieve comparable retriever metric performance to BM25 with emph{no finetuning}. Furthermore, when evaluated on quality of the final prediction, our method often outperforms fine-tuned dense retrievers such as E5.