CAFE: Retrieval Head-based Coarse-to-Fine Information Seeking to Enhance Multi-Document QA Capability

📅 2025-05-15

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

To address imprecise retrieval and reasoning interference from irrelevant documents in long-context multi-document question answering, this paper proposes a retrieval-head-driven coarse-to-fine two-stage framework. First, a lightweight retrieval head performs coarse-grained document-level filtering; then, attention-guided evidence localization identifies key supporting passages, with joint fine-tuning of the Mistral model to strengthen evidence dependency. The framework dynamically suppresses background and noisy documents, achieving high recall while significantly improving retrieval precision. On multi-document QA benchmarks, our method achieves a 22.1% and 13.7% improvement in SubEM over supervised fine-tuning (SFT) and retrieval-augmented generation (RAG) baselines, respectively—marking the first approach to jointly optimize retrieval precision and recall. This yields a scalable, interpretable paradigm for long-context QA.

Technology Category

Application Category

📝 Abstract

Advancements in Large Language Models (LLMs) have extended their input context length, yet they still struggle with retrieval and reasoning in long-context inputs. Existing methods propose to utilize the prompt strategy and retrieval head to alleviate this limitation. However, they still face challenges in balancing retrieval precision and recall, impacting their efficacy in answering questions. To address this, we introduce $ extbf{CAFE}$, a two-stage coarse-to-fine method to enhance multi-document question-answering capacities. By gradually eliminating the negative impacts of background and distracting documents, CAFE makes the responses more reliant on the evidence documents. Initially, a coarse-grained filtering method leverages retrieval heads to identify and rank relevant documents. Then, a fine-grained steering method guides attention to the most relevant content. Experiments across benchmarks show CAFE outperforms baselines, achieving up to 22.1% and 13.7% SubEM improvement over SFT and RAG methods on the Mistral model, respectively.

Problem

Research questions and friction points this paper is trying to address.

Improves retrieval in long-context LLM inputs

Balances precision and recall in document retrieval

Enhances multi-document question-answering accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage coarse-to-fine document filtering

Retrieval head-based relevance ranking

Fine-grained attention steering for evidence focus

🔎 Similar Papers

Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models