What External Knowledge is Preferred by LLMs? Characterizing and Exploring Chain of Evidence in Imperfect Context

📅 2024-12-17

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses critical challenges in multi-hop question answering with large language models (LLMs): low answer accuracy, poor answer faithfulness, and weak robustness to noisy or conflicting external knowledge. Inspired by the judicial Chain of Evidence (CoE) paradigm—where evidence is rigorously evaluated for logical coherence and mutual support—we introduce, for the first time, a CoE-inspired framework for LLM-based knowledge assessment. Our CoE-aware reasoning framework jointly models (i) relevance between retrieved knowledge and the query, and (ii) multi-hop logical consistency among knowledge snippets, and integrates seamlessly into retrieval-augmented generation (RAG) pipelines. Evaluated across five mainstream LLMs and three realistic RAG settings, our method consistently improves answer accuracy, faithfulness, and robustness against knowledge noise and contradictions. This establishes a novel, principled paradigm for trustworthy knowledge-enhanced reasoning.

Technology Category

Application Category

📝 Abstract

Incorporating external knowledge into large language models (LLMs) has emerged as a promising approach to mitigate outdated knowledge and hallucination in LLMs. However, external knowledge is often imperfect. In addition to useful knowledge, external knowledge is rich in irrelevant or misinformation in the context that can impair the reliability of LLM responses. This paper focuses on LLMs' preferred external knowledge in imperfect contexts when handling multi-hop QA. Inspired by criminal procedural law's Chain of Evidence (CoE), we characterize that knowledge preferred by LLMs should maintain both relevance to the question and mutual support among knowledge pieces. Accordingly, we propose an automated CoE discrimination approach and evaluate LLMs' effectiveness, faithfulness and robustness with CoE, including its application in the Retrieval-Augmented Generation (RAG). Tests on five LLMs show CoE improves generation accuracy, answer faithfulness, robustness to knowledge conflicts, and boosts the performance of existing approaches in three practical RAG scenarios.

Problem

Research questions and friction points this paper is trying to address.

Identify preferred external knowledge for LLMs in imperfect contexts

Ensure relevance and mutual support in knowledge for multi-hop QA

Improve LLM accuracy and robustness using Chain of Evidence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain of Evidence (CoE) for knowledge relevance

Automated CoE discrimination approach

Enhances RAG performance and robustness

🔎 Similar Papers

ClashEval: Quantifying the tug-of-war between an LLM's internal prior and external evidence