🤖 AI Summary
This paper addresses the limitation of large language models (LLMs) in real-world information-seeking tasks requiring integration of multi-source evidence to verify hypotheses, introducing for the first time the “integrative grounding” task and evaluation framework. Methodologically, it constructs a cross-domain benchmark, proposes a premise-induction–driven retrieval planning strategy, and incorporates a zero-shot self-reflection mechanism to enhance evidence verification quality. Key contributions include: (1) revealing that LLMs frequently generate knowledge hallucinations under incomplete information; (2) demonstrating that premise induction significantly outperforms undirected retrieval in suppressing noise and improving evidence relevance; and (3) finding that while LLMs exhibit robustness to redundant evidence, external constraints are necessary to prevent overreliance on internal parametric knowledge. Empirical results validate the framework’s effectiveness in improving multi-evidence reasoning capabilities.
📝 Abstract
Grounding large language models (LLMs) in external knowledge sources is a promising method for faithful prediction. While existing grounding approaches work well for simple queries, many real-world information needs require synthesizing multiple pieces of evidence. We introduce "integrative grounding" -- the challenge of retrieving and verifying multiple inter-dependent pieces of evidence to support a hypothesis query. To systematically study this problem, we repurpose data from four domains for evaluating integrative grounding capabilities. Our investigation reveals two critical findings: First, in groundedness verification, while LLMs are robust to redundant evidence, they tend to rationalize using internal knowledge when information is incomplete. Second, in examining retrieval planning strategies, we find that undirected planning can degrade performance through noise introduction, while premise abduction emerges as a promising approach due to its logical constraints. Additionally, LLMs' zero-shot self-reflection capabilities consistently improve grounding quality. These insights provide valuable direction for developing more effective integrative grounding systems.