🤖 AI Summary
This study addresses the challenge that large language models (LLMs) struggle to effectively integrate and adjudicate among conflicting information from their internal knowledge, user-provided assertions, and retrieved documents, thereby compromising system reliability. To overcome the limitations of prior binary-conflict paradigms, this work introduces the first tripartite interaction evaluation framework, systematically assessing the balancing strategies of 27 LLMs across two benchmark datasets. Through large-scale behavioral analysis and multi-source fine-tuning, the study quantifies each model’s reliance on the three information sources. Findings reveal that most models exhibit a stronger preference for retrieved documents over user assertions—a bias further amplified by post-training—yet targeted fine-tuning significantly enhances their ability to discern beneficial from harmful external information.
📝 Abstract
Large language models (LLMs) often need to balance their internal parametric knowledge with external information, such as user beliefs and content from retrieved documents, in real-world scenarios like RAG or chat-based systems. A model's ability to reliably process these sources is key to system safety. Previous studies on knowledge conflict and sycophancy are limited to a binary conflict paradigm, primarily exploring conflicts between parametric knowledge and either a document or a user, but ignoring the interactive environment where all three sources exist simultaneously. To fill this gap, we propose a three-source interaction framework and systematically evaluate 27 LLMs from 3 families on 2 datasets. Our findings reveal general patterns: most models rely more on document assertions than user assertions, and this preference is reinforced by post-training. Furthermore, our behavioral analysis shows that most models are impressionable, unable to effectively discriminate between helpful and harmful external information. To address this, we demonstrate that fine-tuning on diverse source interaction data can significantly increase a model's discrimination abilities. In short, our work paves the way for developing trustworthy LLMs that can effectively and reliably integrate multiple sources of information. Code is available at https://github.com/shuowl/llm-source-balancing.