Maximally-Informative Retrieval for State Space Model Generation

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

163K/year

🤖 AI Summary

This work addresses the challenge of dynamic data selection for retrieval-augmented reasoning in state-space models. We propose RICO, the first framework to perform end-to-end retrieval weight learning using gradient feedback from the large language model (LLM) itself—without external heuristics or supervised fine-tuning. RICO unifies retrieval-augmented generation (RAG), in-context optimization, and gradient-driven document weighting, optimizing retrieval quality via unsupervised perplexity minimization to reduce query-level reasoning uncertainty. Experiments show that RICO matches BM25’s performance without any fine-tuning and significantly outperforms supervised dense retrievers (e.g., E5) in final prediction accuracy. By enabling fully unsupervised, adaptive retrieval, RICO establishes a novel paradigm for dynamic, self-calibrating retrieval in LLM-based reasoning systems.

Technology Category

Application Category

📝 Abstract

Given a query and dataset, the optimal way of answering the query is to make use all the information available. Modern LLMs exhibit impressive ability to memorize training data, but data not deemed important during training is forgotten, and information outside that training set cannot be made use of. Processing an entire dataset at inference time is infeasible due to the bounded nature of model resources (e.g. context size in transformers or states in state space models), meaning we must resort to external memory. This constraint naturally leads to the following problem: How can we decide based on the present query and model, what among a virtually unbounded set of known data matters for inference? To minimize model uncertainty for a particular query at test-time, we introduce Retrieval In-Context Optimization (RICO), a retrieval method that uses gradients from the LLM itself to learn the optimal mixture of documents for answer generation. Unlike traditional retrieval-augmented generation (RAG), which relies on external heuristics for document retrieval, our approach leverages direct feedback from the model. Theoretically, we show that standard top-$k$ retrieval with model gradients can approximate our optimization procedure, and provide connections to the leave-one-out loss. We demonstrate empirically that by minimizing an unsupervised loss objective in the form of question perplexity, we can achieve comparable retriever metric performance to BM25 with emph{no finetuning}. Furthermore, when evaluated on quality of the final prediction, our method often outperforms fine-tuned dense retrievers such as E5.

Problem

Research questions and friction points this paper is trying to address.

Optimize document retrieval for LLM answer generation

Minimize model uncertainty using gradient-based retrieval

Improve inference without fine-tuning or external heuristics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses gradients from LLM for retrieval optimization

Minimizes model uncertainty via question perplexity

Outperforms fine-tuned retrievers without finetuning

🔎 Similar Papers

No similar papers found.