🤖 AI Summary
In RAG systems, conflicts frequently arise between externally retrieved knowledge and the LLM’s parametric knowledge—manifesting as factual, temporal, or semantic inconsistencies—and degrade QA performance. To address this, we propose “Actionable Self-Reasoning,” a novel paradigm that replaces static context comparison with a hierarchical, executable action space. Our approach dynamically assesses reasoning complexity and decomposes knowledge consistency verification into fine-grained, adaptive, executable steps. Methodologically, it comprises hierarchical action modeling, multi-type conflict identification, and conflict resolution mechanisms. Evaluated across five benchmark datasets and three conflict categories (factual, temporal, semantic), our method outperforms all state-of-the-art approaches: it achieves significant accuracy gains on temporal and semantic conflict cases, incurs zero performance degradation on non-conflict questions, and demonstrates strong robustness and generalization.
📝 Abstract
Retrieval-Augmented Generation (RAG) systems commonly suffer from Knowledge Conflicts, where retrieved external knowledge contradicts the inherent, parametric knowledge of large language models (LLMs). It adversely affects performance on downstream tasks such as question answering (QA). Existing approaches often attempt to mitigate conflicts by directly comparing two knowledge sources in a side-by-side manner, but this can overwhelm LLMs with extraneous or lengthy contexts, ultimately hindering their ability to identify and mitigate inconsistencies. To address this issue, we propose Micro-Act a framework with a hierarchical action space that automatically perceives context complexity and adaptively decomposes each knowledge source into a sequence of fine-grained comparisons. These comparisons are represented as actionable steps, enabling reasoning beyond the superficial context. Through extensive experiments on five benchmark datasets, Micro-Act consistently achieves significant increase in QA accuracy over state-of-the-art baselines across all 5 datasets and 3 conflict types, especially in temporal and semantic types where all baselines fail significantly. More importantly, Micro-Act exhibits robust performance on non-conflict questions simultaneously, highlighting its practical value in real-world RAG applications.