ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

Existing RAG systems struggle to effectively utilize retrieved documents—especially when supporting evidence is implicit, fragmented, or corrupted by noise—leading to weak cue extraction, unreliable reasoning, and poor interpretability. To address these challenges, we propose the Cue-anchored Reasoning Exploration and Optimization (CA-REO) framework. CA-REO introduces a novel unsupervised cue-anchoring mechanism that automatically identifies critical implicit evidence; constructs parallel reasoning paths under diverse knowledge configurations; and incorporates reward modeling with preference-based reinforcement optimization (RPO) to dynamically select the optimal path. Evaluated on benchmarks measuring reasoning completeness and robustness, CA-REO significantly outperforms state-of-the-art RAG methods. It demonstrates strong adaptability to noisy or sparse retrieval results, requires no manual cue annotation, and simultaneously ensures factual accuracy and reasoning interpretability.

Technology Category

Application Category

📝 Abstract

Retrieval-Augmented Generation (RAG) augments Large Language Models (LLMs) with external knowledge to improve factuality. However, existing RAG systems frequently underutilize the retrieved documents, failing to extract and integrate the key clues needed to support faithful and interpretable reasoning, especially in cases where relevant evidence is implicit, scattered, or obscured by noise. To address this issue, we propose ClueAnchor, a novel framework for enhancing RAG via clue-anchored reasoning exploration and optimization. ClueAnchor extracts key clues from retrieved content and generates multiple reasoning paths based on different knowledge configurations, optimizing the model by selecting the most effective one through reward-based preference optimization. Experiments show that ClueAnchor significantly outperforms prior RAG baselines in reasoning completeness and robustness. Further analysis confirms its strong resilience to noisy or partially relevant retrieved content, as well as its capability to identify supporting evidence even in the absence of explicit clue supervision during inference.

Problem

Research questions and friction points this paper is trying to address.

Underutilization of retrieved documents in RAG systems

Difficulty extracting implicit or scattered key clues

Poor reasoning robustness with noisy retrieved content

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts key clues from retrieved documents

Generates multiple reasoning paths for optimization

Uses reward-based preference for path selection

🔎 Similar Papers

MemoRAG: Boosting Long Context Processing with Global Memory-Enhanced Retrieval Augmentation