Plan-and-Refine: Diverse and Comprehensive Retrieval-Augmented Generation

📅 2025-04-10

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

This work addresses the limited response diversity and inadequate information coverage in retrieval-augmented generation (RAG). We propose a two-stage Plan-and-Refine framework: (1) a planning stage that generates a global query plan via multi-perspective prompting to explicitly model diversity; and (2) a refinement stage that iteratively executes conditional generation, self-refinement, and joint evaluation—assessing both factual consistency and coverage—followed by ICAT-driven reward modeling to select the optimal response. This work introduces the first RAG paradigm integrating *planning-first*, *iterative refinement*, and *joint evaluation*. Experiments on ANTIQUE and TREC benchmarks show improvements of 13.1% and 15.41%, respectively, over strong baselines. A user study further confirms significant gains in response quality, usability, and perceived informativeness.

Technology Category

Application Category

📝 Abstract

This paper studies the limitations of (retrieval-augmented) large language models (LLMs) in generating diverse and comprehensive responses, and introduces the Plan-and-Refine (P&R) framework based on a two phase system design. In the global exploration phase, P&R generates a diverse set of plans for the given input, where each plan consists of a list of diverse query aspects with corresponding additional descriptions. This phase is followed by a local exploitation phase that generates a response proposal for the input query conditioned on each plan and iteratively refines the proposal for improving the proposal quality. Finally, a reward model is employed to select the proposal with the highest factuality and coverage. We conduct our experiments based on the ICAT evaluation methodology--a recent approach for answer factuality and comprehensiveness evaluation. Experiments on the two diverse information seeking benchmarks adopted from non-factoid question answering and TREC search result diversification tasks demonstrate that P&R significantly outperforms baselines, achieving up to a 13.1% improvement on the ANTIQUE dataset and a 15.41% improvement on the TREC dataset. Furthermore, a smaller scale user study confirms the substantial efficacy of the P&R framework.

Problem

Research questions and friction points this paper is trying to address.

Enhancing diversity in LLM-generated responses

Improving comprehensiveness of retrieval-augmented generation

Optimizing factuality and coverage in information-seeking tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-phase Plan-and-Refine framework design

Global exploration for diverse query aspects

Local exploitation with iterative proposal refinement

🔎 Similar Papers

No similar papers found.