LLM-based Corroborating and Refuting Evidence Retrieval for Scientific Claim Verification

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the challenges of uncertainty in LLM-based evidence retrieval and poor cross-model and cross-domain generalization in scientific claim verification. To this end, we propose CIBER, a novel framework that leverages multi-perspective prompt probing to assess response consistency across LLMs, establishing an unsupervised, black-box behavioral analysis paradigm—without requiring access to internal model parameters. Crucially, CIBER directly models response consistency as the primary signal for discriminating supporting versus refuting evidence. Integrating retrieval-augmented generation (RAG), zero-shot evidence classification, and black-box behavioral modeling, CIBER significantly outperforms conventional RAG methods across diverse scientific domains—including biology, physics, and medicine—and demonstrates strong robustness and high accuracy in evidence identification across multilingual LLMs. Its pioneering unsupervised, consistency-driven evidence retrieval mechanism uniquely balances interpretability with broad generalizability.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce CIBER (Claim Investigation Based on Evidence Retrieval), an extension of the Retrieval-Augmented Generation (RAG) framework designed to identify corroborating and refuting documents as evidence for scientific claim verification. CIBER addresses the inherent uncertainty in Large Language Models (LLMs) by evaluating response consistency across diverse interrogation probes. By focusing on the behavioral analysis of LLMs without requiring access to their internal information, CIBER is applicable to both white-box and black-box models. Furthermore, CIBER operates in an unsupervised manner, enabling easy generalization across various scientific domains. Comprehensive evaluations conducted using LLMs with varying levels of linguistic proficiency reveal CIBER's superior performance compared to conventional RAG approaches. These findings not only highlight the effectiveness of CIBER but also provide valuable insights for future advancements in LLM-based scientific claim verification.

Problem

Research questions and friction points this paper is trying to address.

Enhances scientific claim verification using evidence retrieval.

Addresses uncertainty in LLMs through response consistency evaluation.

Operates unsupervised, generalizing across diverse scientific domains.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends RAG for evidence retrieval

Evaluates LLM response consistency

Unsupervised, domain-generalizable approach

🔎 Similar Papers

Claim Verification in the Age of Large Language Models: A Survey