Relevance to Utility: Process-Supervised Rewrite for RAG

📅 2025-09-19

📈 Citations: 0

✨ Influential: 0

career value

131K/year

🤖 AI Summary

In retrieval-augmented generation (RAG) systems, a mismatch exists between retrieval relevance and generative utility: retrieved documents are often topically relevant yet lack critical information required for reasoning. Method: We propose a process-supervised rewriting optimization framework that—uniquely—leverages intermediate reasoning signals generated during the LLM’s answer production as explicit supervision to model document utility more accurately. We further design an LLM distillation pipeline to transfer large-model judgment capabilities on rewriting quality to a lightweight rewriting model. Contribution/Results: Our method jointly optimizes retrieval, rewriting, and answer generation in an end-to-end manner. It achieves significant improvements over strong baselines across multiple open-domain question answering benchmarks, demonstrating both the effectiveness and generalizability of process supervision in enhancing generative utility.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation systems often suffer from a gap between optimizing retrieval relevance and generative utility: retrieved documents may be topically relevant but still lack the content needed for effective reasoning during generation. While existing "bridge" modules attempt to rewrite the retrieved text for better generation, we show how they fail to capture true document utility. In this work, we propose R2U, with a key distinction of directly optimizing to maximize the probability of generating a correct answer through process supervision. As such direct observation is expensive, we also propose approximating an efficient distillation pipeline by scaling the supervision from LLMs, which helps the smaller rewriter model generalize better. We evaluate our method across multiple open-domain question-answering benchmarks. The empirical results demonstrate consistent improvements over strong bridging baselines.

Problem

Research questions and friction points this paper is trying to address.

Bridging relevance-utility gap in RAG systems

Optimizing retrieval for generative answer correctness

Improving document utility through process supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Process-supervised rewrite for RAG

Optimizing generation correctness probability directly

Efficient distillation pipeline with LLM supervision

🔎 Similar Papers

No similar papers found.