Beyond Independent Passages: Adaptive Passage Combination Retrieval for Retrieval Augmented Open-Domain Question Answering

📅 2025-07-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional RAG methods retrieve paragraphs independently, leading to redundancy, noise, and insufficient diversity—especially in multi-hop question answering and noisy corpora. To address this, we propose AdaPCR, the first framework to treat paragraph combinations—not individual paragraphs—as the fundamental unit for retrieval and reranking, explicitly modeling inter-paragraph dependencies. Its key contributions are: (1) an adaptive paragraph-combination retrieval mechanism that dynamically determines the optimal combination size without requiring an auxiliary stopping module; (2) a predictive reranking objective aligned with answer generation, jointly optimizing cross-paragraph reasoning capability; and (3) a hybrid reranker integrating context-aware query reformulation with black-box large language model–driven combination-level scoring. Evaluated on multiple open-domain QA benchmarks, AdaPCR significantly outperforms state-of-the-art baselines, with particularly pronounced gains on multi-hop tasks—demonstrating that modeling paragraph combinations fundamentally enhances retrieval quality.

Technology Category

Application Category

📝 Abstract
Retrieval-augmented generation (RAG) enhances large language models (LLMs) by incorporating external documents at inference time, enabling up-to-date knowledge access without costly retraining. However, conventional RAG methods retrieve passages independently, often leading to redundant, noisy, or insufficiently diverse context-particularly problematic - particularly problematic in noisy corpora and for multi-hop questions. To address this, we propose Adaptive Passage Combination Retrieval (AdaPCR), a novel framework for open-domain question answering with black-box LMs. AdaPCR explicitly models dependencies between passages by considering passage combinations as units for retrieval and reranking. It consists of a context-aware query reformulation using concatenated passages, and a reranking step trained with a predictive objective aligned with downstream answer likelihood. Crucially, AdaPCR adaptively selects the number of retrieved passages without additional stopping modules. Experiments across several QA benchmarks show that AdaPCR outperforms baselines, particularly in multi-hop reasoning, demonstrating the effectiveness of modeling inter-passage dependencies for improved retrieval.
Problem

Research questions and friction points this paper is trying to address.

Retrieving redundant or noisy passages in RAG systems
Handling multi-hop questions with insufficient context diversity
Adaptively selecting optimal passage combinations without stopping modules
Innovation

Methods, ideas, or system contributions that make the work stand out.

Adaptive retrieval of passage combinations
Context-aware query reformulation with passages
Reranking aligned with answer likelihood
🔎 Similar Papers
No similar papers found.