RECON: Reasoning with Condensation for Efficient Retrieval-Augmented Generation

📅 2025-10-12

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

RAG systems often suffer from high context costs and degraded performance in reinforcement learning (RL)-driven multi-step reasoning due to lengthy, noisy retrieved documents. To address this, we propose RECON, the first framework to integrate a learnable, explicit summarization module into the RL-based RAG inference loop. RECON employs a two-stage training strategy—relevance-aware pretraining followed by multi-dimensional factual knowledge distillation—to ensure summaries are both accurate and concise. Fully end-to-end integrated within the Search-R1 pipeline, RECON reduces average context length by 35% on multi-hop QA tasks, while significantly lowering inference latency and training time. On benchmark evaluations, it improves exact match (EM) scores by 14.5% and 3.0% for 3B- and 7B-parameter models, respectively. These results demonstrate that learned contextual compression enables synergistic optimization of both efficiency and effectiveness in RAG systems.

Technology Category

Application Category

📝 Abstract

Retrieval-augmented generation (RAG) systems trained using reinforcement learning (RL) with reasoning are hampered by inefficient context management, where long, noisy retrieved documents increase costs and degrade performance. We introduce RECON (REasoning with CONdensation), a framework that integrates an explicit summarization module to compress evidence within the reasoning loop. Our summarizer is trained via a two-stage process: relevance pretraining on QA datasets, followed by multi-aspect distillation from proprietary LLMs to ensure factuality and clarity. Integrated into the Search-R1 pipeline, RECON reduces total context length by 35%, leading to improved training speed and inference latency, while simultaneously improving RAG performance on downstream QA benchmarks. Notably, it boosts the average EM score of the 3B model by 14.5% and the 7B model by 3.0%, showing particular strength in multi-hop QA. RECON demonstrates that learned context compression is essential for building practical, scalable, and performant RAG systems. Our code implementation is made available at https://github.com/allfornancy/RECON.

Problem

Research questions and friction points this paper is trying to address.

Compressing retrieved documents to reduce context length

Improving training speed and inference latency in RAG systems

Enhancing RAG performance on question-answering benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates explicit summarization module for evidence compression

Uses two-stage training with relevance pretraining and distillation

Reduces context length by 35% improving speed and performance

🔎 Similar Papers

No similar papers found.