CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning

📅 2025-11-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of long-context processing in retrieval-augmented generation (RAG) and the misalignment between retrieval and generation optimization, this paper proposes CLaRa: a framework that jointly optimizes retrieval and generation via semantic compression within a shared continuous embedding space. We theoretically prove that retrieval relevance can be aligned with answer quality, enabling end-to-end training. CLaRa introduces four key components: (i) SCP—a synthetic data construction method for context compression; (ii) dual-supervised semantic-preserving compression using question answering and query rewriting objectives; (iii) a differentiable top-k estimator for gradient flow through retrieval; and (iv) a unified language modeling loss. On multiple open-domain question answering benchmarks, CLaRa significantly outperforms text-finetuning baselines, achieving state-of-the-art performance in both compression ratio (up to 3.2× improvement) and re-ranking accuracy (+5.8% MRR), marking the first demonstration of synergistic gains from high-fidelity compression and joint optimization.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation (RAG) enhances large language models (LLMs) with external knowledge but still suffers from long contexts and disjoint retrieval-generation optimization. In this work, we propose CLaRa (Continuous Latent Reasoning), a unified framework that performs embedding-based compression and joint optimization in a shared continuous space. To obtain semantically rich and retrievable compressed vectors, we introduce SCP, a key-preserving data synthesis framework using QA and paraphrase supervision. CLaRa then trains the reranker and generator end-to-end via a single language modeling loss, with gradients flowing through both modules using a differentiable top-k estimator. Theoretically, this unified optimization aligns retrieval relevance with answer quality. Experiments across multiple QA benchmarks show that CLaRa achieves state-of-the-art compression and reranking performance, often surpassing text-based fine-tuned baselines.
Problem

Research questions and friction points this paper is trying to address.

Addresses disjoint optimization in retrieval-augmented generation systems
Solves inefficient handling of long contexts in knowledge-enhanced LLMs
Improves semantic compression and joint training for retrieval-generation alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continuous Latent Reasoning unifies retrieval and generation
Key-preserving data synthesis enriches semantic compression
Differentiable top-k estimator enables end-to-end joint optimization
🔎 Similar Papers