ClueAnchor: Clue-Anchored Knowledge Reasoning Exploration and Optimization for Retrieval-Augmented Generation

📅 2025-05-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RAG systems struggle to effectively utilize retrieved documents—especially when supporting evidence is implicit, fragmented, or corrupted by noise—leading to weak cue extraction, unreliable reasoning, and poor interpretability. To address these challenges, we propose the Cue-anchored Reasoning Exploration and Optimization (CA-REO) framework. CA-REO introduces a novel unsupervised cue-anchoring mechanism that automatically identifies critical implicit evidence; constructs parallel reasoning paths under diverse knowledge configurations; and incorporates reward modeling with preference-based reinforcement optimization (RPO) to dynamically select the optimal path. Evaluated on benchmarks measuring reasoning completeness and robustness, CA-REO significantly outperforms state-of-the-art RAG methods. It demonstrates strong adaptability to noisy or sparse retrieval results, requires no manual cue annotation, and simultaneously ensures factual accuracy and reasoning interpretability.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) augments Large Language Models (LLMs) with external knowledge to improve factuality. However, existing RAG systems frequently underutilize the retrieved documents, failing to extract and integrate the key clues needed to support faithful and interpretable reasoning, especially in cases where relevant evidence is implicit, scattered, or obscured by noise. To address this issue, we propose ClueAnchor, a novel framework for enhancing RAG via clue-anchored reasoning exploration and optimization. ClueAnchor extracts key clues from retrieved content and generates multiple reasoning paths based on different knowledge configurations, optimizing the model by selecting the most effective one through reward-based preference optimization. Experiments show that ClueAnchor significantly outperforms prior RAG baselines in reasoning completeness and robustness. Further analysis confirms its strong resilience to noisy or partially relevant retrieved content, as well as its capability to identify supporting evidence even in the absence of explicit clue supervision during inference.
Problem

Research questions and friction points this paper is trying to address.

Underutilization of retrieved documents in RAG systems
Difficulty extracting implicit or scattered key clues
Poor reasoning robustness with noisy retrieved content
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extracts key clues from retrieved documents
Generates multiple reasoning paths for optimization
Uses reward-based preference for path selection
🔎 Similar Papers
No similar papers found.