🤖 AI Summary
Retro-li addresses the challenges of inaccurate retrieval, noise sensitivity, and poor cross-domain generalization in RAG systems with small-scale non-parametric memory banks—stemming from data sparsity. It proposes a lightweight, robust retrieval-augmented generation framework. Its core contributions are: (1) the first introduction of non-parametric memory regularization, substantially improving semantic retrieval robustness under noisy conditions and generalization capability under domain shift; and (2) an in-memory-computing-friendly architecture enabling O(1) constant-time retrieval. Experiments demonstrate that Retro-li maintains high retrieval accuracy on small memory banks, achieves significantly lower perplexity than baselines, delivers marked performance gains on cross-domain tasks, and incurs less than 1% accuracy degradation due to retrieval noise in hardware simulations.
📝 Abstract
The retrieval augmented generation (RAG) system such as Retro has been shown to improve language modeling capabilities and reduce toxicity and hallucinations by retrieving from a database of non-parametric memory containing trillions of entries. We introduce Retro-li that shows retrieval can also help using a small-scale database, but it demands more accurate and better neighbors when searching in a smaller hence sparser non-parametric memory. This can be met by using a proper semantic similarity search. We further propose adding a regularization to the non-parametric memory for the first time: it significantly reduces perplexity when the neighbor search operations are noisy during inference, and it improves generalization when a domain shift occurs. We also show that Retro-li's non-parametric memory can potentially be implemented on analog in-memory computing hardware, exhibiting O(1) search time while causing noise in retrieving neighbors, with minimal (<1%) performance loss. Our code is available at: https://github.com/IBM/Retrieval-Enhanced-Transformer-Little.