InteGround: On the Evaluation of Verification and Retrieval Planning in Integrative Grounding

📅 2025-09-20

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This paper addresses the limitation of large language models (LLMs) in real-world information-seeking tasks requiring integration of multi-source evidence to verify hypotheses, introducing for the first time the “integrative grounding” task and evaluation framework. Methodologically, it constructs a cross-domain benchmark, proposes a premise-induction–driven retrieval planning strategy, and incorporates a zero-shot self-reflection mechanism to enhance evidence verification quality. Key contributions include: (1) revealing that LLMs frequently generate knowledge hallucinations under incomplete information; (2) demonstrating that premise induction significantly outperforms undirected retrieval in suppressing noise and improving evidence relevance; and (3) finding that while LLMs exhibit robustness to redundant evidence, external constraints are necessary to prevent overreliance on internal parametric knowledge. Empirical results validate the framework’s effectiveness in improving multi-evidence reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Grounding large language models (LLMs) in external knowledge sources is a promising method for faithful prediction. While existing grounding approaches work well for simple queries, many real-world information needs require synthesizing multiple pieces of evidence. We introduce "integrative grounding" -- the challenge of retrieving and verifying multiple inter-dependent pieces of evidence to support a hypothesis query. To systematically study this problem, we repurpose data from four domains for evaluating integrative grounding capabilities. Our investigation reveals two critical findings: First, in groundedness verification, while LLMs are robust to redundant evidence, they tend to rationalize using internal knowledge when information is incomplete. Second, in examining retrieval planning strategies, we find that undirected planning can degrade performance through noise introduction, while premise abduction emerges as a promising approach due to its logical constraints. Additionally, LLMs' zero-shot self-reflection capabilities consistently improve grounding quality. These insights provide valuable direction for developing more effective integrative grounding systems.

Problem

Research questions and friction points this paper is trying to address.

Addresses grounding LLMs when queries require multiple evidence pieces

Evaluates verification and retrieval planning for interdependent evidence

Studies LLM robustness to incomplete information and noisy evidence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Evaluates verification and retrieval planning strategies

Uses premise abduction for logical constraint-based planning

Leverages LLMs' zero-shot self-reflection to improve grounding

🔎 Similar Papers

No similar papers found.