π€ AI Summary
Existing factuality verification methods predominantly rely on binary judgments, which fail to capture the severity of factual errors and thus limit their utility in fine-grained evaluation and preference optimization. To address this limitation, this work proposes DiVA, a novel framework that introduces a hybrid active-discriminative architecture for end-to-end fine-grained factuality verification. DiVA integrates a large language modelβdriven agent for active retrieval with a discriminative scoring model to assess factual consistency at a granular level. Furthermore, the authors construct FGVeriBench, a new benchmark designed to support fine-grained factuality evaluation. Experimental results demonstrate that DiVA significantly outperforms existing approaches on FGVeriBench, particularly excelling in general and multi-hop question scenarios.
π Abstract
Despite the significant advancements of Large Language Models (LLMs), their factuality remains a critical challenge, fueling growing interest in factuality verification. Existing research on factuality verification primarily conducts binary judgments (e.g., correct or incorrect), which fails to distinguish varying degrees of error severity. This limits its utility for applications such as fine-grained evaluation and preference optimization. To bridge this gap, we propose the Agentic Discriminative Verifier (DiVA), a hybrid framework that synergizes the agentic search capabilities of generative models with the precise scoring aptitude of discriminative models. We also construct a new benchmark, FGVeriBench, as a robust testbed for fine-grained factuality verification. Experimental results on FGVeriBench demonstrate that our DiVA significantly outperforms existing methods on factuality verification for both general and multi-hop questions.