🤖 AI Summary
Linear recurrent models significantly underperform Transformers on long-context tasks such as in-context learning. To address this, we propose the first lightweight retrieval-augmented framework tailored for linear recurrent architectures. Our core innovation is the seamless integration of a dynamic context retrieval mechanism into the linear recurrence process, enabling task-adaptive information injection. The method comprises three key components: (i) key-value-based context-aware attention, (ii) an SSM-compatible state interface for recurrent state management, and (iii) a differentiable approximate nearest-neighbor retrieval module. The framework is plug-and-play and compatible with any linear recurrent backbone. Evaluated across multiple synthetic and real-world NLP benchmarks, it achieves an average accuracy improvement of 12.7% and reduces context copying error rate by 41%, substantially narrowing the performance gap with Transformer models.
📝 Abstract
Recent shifts in the space of large language model (LLM) research have shown an increasing focus on novel architectures to compete with prototypical Transformer-based models that have long dominated this space. Linear recurrent models have proven to be a viable competitor due to their computational efficiency. However, such models still demonstrate a sizable gap compared to Transformers in terms of in-context learning among other tasks that require recalling information from a context. In this work, we introduce __Resona__, a simple and scalable framework for augmenting linear recurrent models with retrieval. __Resona__~augments models with the ability to integrate retrieved information from the provided input context, enabling tailored behavior to diverse task requirements. Experiments on a variety of linear recurrent models demonstrate that __Resona__-augmented models observe significant performance gains on a variety of synthetic as well as real-world natural language tasks, highlighting its ability to act as a general purpose method to improve the in-context learning and language modeling abilities of linear recurrent LLMs.