🤖 AI Summary
This work addresses the challenges of uncertainty in LLM-based evidence retrieval and poor cross-model and cross-domain generalization in scientific claim verification. To this end, we propose CIBER, a novel framework that leverages multi-perspective prompt probing to assess response consistency across LLMs, establishing an unsupervised, black-box behavioral analysis paradigm—without requiring access to internal model parameters. Crucially, CIBER directly models response consistency as the primary signal for discriminating supporting versus refuting evidence. Integrating retrieval-augmented generation (RAG), zero-shot evidence classification, and black-box behavioral modeling, CIBER significantly outperforms conventional RAG methods across diverse scientific domains—including biology, physics, and medicine—and demonstrates strong robustness and high accuracy in evidence identification across multilingual LLMs. Its pioneering unsupervised, consistency-driven evidence retrieval mechanism uniquely balances interpretability with broad generalizability.
📝 Abstract
In this paper, we introduce CIBER (Claim Investigation Based on Evidence Retrieval), an extension of the Retrieval-Augmented Generation (RAG) framework designed to identify corroborating and refuting documents as evidence for scientific claim verification. CIBER addresses the inherent uncertainty in Large Language Models (LLMs) by evaluating response consistency across diverse interrogation probes. By focusing on the behavioral analysis of LLMs without requiring access to their internal information, CIBER is applicable to both white-box and black-box models. Furthermore, CIBER operates in an unsupervised manner, enabling easy generalization across various scientific domains. Comprehensive evaluations conducted using LLMs with varying levels of linguistic proficiency reveal CIBER's superior performance compared to conventional RAG approaches. These findings not only highlight the effectiveness of CIBER but also provide valuable insights for future advancements in LLM-based scientific claim verification.