🤖 AI Summary
This work addresses a critical limitation in current retrieval-augmented generation (RAG) systems: their inability to persist query-time adaptations, leading to redundant computation and a lack of cumulative learning, while index-side updates often induce semantic drift and noise accumulation. To overcome this, the authors propose Evolving Retrieval Memory (ERM), a training-free continual learning framework that transforms transient gains into persistent key evolution through correctness-gated feedback and atomic-level signal attribution. ERM establishes, for the first time, theoretical equivalence between query expansion and key expansion under standard similarity functions, introduces a norm-bounded stable update mechanism, and proves convergence under selective updating—enabling online index optimization with zero inference overhead. Evaluated across 13 domains from BEIR and BRIGHT benchmarks, ERM significantly enhances both retrieval and generation performance, particularly excelling in reasoning-intensive tasks while preserving native retrieval speed.
📝 Abstract
Retrieval-augmented generation (RAG) systems commonly improve robustness via query-time adaptations such as query expansion and iterative retrieval. While effective, these approaches are inherently stateless: adaptations are recomputed for each query and discarded thereafter, precluding cumulative learning and repeatedly incurring inference-time cost. Index-side approaches like key expansion introduce persistence but rely on offline preprocessing or heuristic updates that are weakly aligned with downstream task utility, leading to semantic drift and noise accumulation. We propose Evolving Retrieval Memory (ERM), a training-free framework that transforms transient query-time gains into persistent retrieval improvements. ERM updates the retrieval index through correctness-gated feedback, selectively attributes atomic expansion signals to the document keys they benefit, and progressively evolves keys via stable, norm-bounded updates. We show that query and key expansion are theoretically equivalent under standard similarity functions and prove convergence of ERM's selective updates, amortizing optimal query expansion into a stable index with zero inference-time overhead. Experiments on BEIR and BRIGHT across 13 domains demonstrate consistent gains in retrieval and generation, particularly on reasoning-intensive tasks, at native retrieval speed.