MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation

📅 2024-09-09
📈 Citations: 23
Influential: 4
📄 PDF
🤖 AI Summary
Large language models (LLMs) face significant challenges in long-context processing, including prohibitive computational overhead, excessive memory consumption, and limitations of traditional RAG—namely its reliance on explicit queries and structured knowledge. To address these issues, we propose a dual-system RAG framework: a lightweight long-range system that constructs a global memory, generates cue-rich preliminary answers, and retrieves relevant passages efficiently; and a heavyweight expressive system that synthesizes high-quality final answers from retrieved content. Our key contributions include (1) a global-memory-augmented RAG paradigm that eliminates dependence on explicit queries and structured knowledge, and (2) a reinforcement learning-guided KV compression mechanism (RLGF) that dynamically optimizes memory representations. Extensive evaluation across diverse long-context benchmarks demonstrates substantial improvements over state-of-the-art baselines—particularly in complex scenarios where conventional RAG fails—achieving superior trade-offs between efficiency and effectiveness.

Technology Category

Application Category

📝 Abstract
Processing long contexts presents a significant challenge for large language models (LLMs). While recent advancements allow LLMs to handle much longer contexts than before (e.g., 32K or 128K tokens), it is computationally expensive and can still be insufficient for many applications. Retrieval-Augmented Generation (RAG) is considered a promising strategy to address this problem. However, conventional RAG methods face inherent limitations because of two underlying requirements: 1) explicitly stated queries, and 2) well-structured knowledge. These conditions, however, do not hold in general long-context processing tasks. In this work, we propose MemoRAG, a novel RAG framework empowered by global memory-augmented retrieval. MemoRAG features a dual-system architecture. First, it employs a light but long-range system to create a global memory of the long context. Once a task is presented, it generates draft answers, providing useful clues for the retrieval tools to locate relevant information within the long context. Second, it leverages an expensive but expressive system, which generates the final answer based on the retrieved information. Building upon this fundamental framework, we realize the memory module in the form of KV compression, and reinforce its memorization and cluing capacity from the Generation quality's Feedback (a.k.a. RLGF). In our experiments, MemoRAG achieves superior performances across a variety of long-context evaluation tasks, not only complex scenarios where traditional RAG methods struggle, but also simpler ones where RAG is typically applied.
Problem

Research questions and friction points this paper is trying to address.

Enhancing long context processing in LLMs with global memory
Overcoming limitations of traditional RAG in unstructured contexts
Reducing computational costs while improving retrieval accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Global memory-augmented retrieval for RAG
Dual-system architecture with global memory
KV compression and RLGF for memory enhancement
🔎 Similar Papers
No similar papers found.