Veri-R1: Toward Precise and Faithful Claim Verification via Online Reinforcement Learning

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM-based claim verification methods rely on static prompting or predefined pipelines, lacking a unified training paradigm to jointly enhance planning, retrieval, and reasoning capabilities. Method: We propose Veri-R1—the first end-to-end online reinforcement learning framework for claim verification—enabling models to dynamically invoke search engines for iterative evidence retrieval and multi-step logical reasoning. The framework employs explicit reward signals to jointly optimize interactive decision-making (e.g., query formulation and search timing) and logical generation (e.g., evidence integration and final verdict). Contribution/Results: Experiments show Veri-R1 achieves up to a 30% absolute gain in joint accuracy and doubles evidence quality over prior methods. It consistently outperforms larger models across multiple benchmarks, demonstrating strong effectiveness, skill coordination, and scalability in end-to-end verification.

Technology Category

Application Category

📝 Abstract
Claim verification with large language models (LLMs) has recently attracted considerable attention, owing to their superior reasoning capabilities and transparent verification pathways compared to traditional answer-only judgments. Online claim verification requires iterative evidence retrieval and reasoning, yet existing approaches mainly rely on prompt engineering or predesigned reasoning workflows without offering a unified training paradigm to improve necessary skills. Therefore, we introduce Veri-R1, an online reinforcement learning (RL) framework that enables an LLM to interact with a search engine and to receive reward signals that explicitly shape its planning, retrieval, and reasoning behaviors. The dynamic interaction between models and retrieval systems more accurately reflects real-world verification scenarios and fosters comprehensive verification skills. Empirical results show that Veri-R1 improves joint accuracy by up to 30% and doubles evidence score, often surpassing larger-scale counterparts. Ablation studies further reveal the impact of reward components and the link between output logits and label accuracy. Our results highlight the effectiveness of online RL for precise and faithful claim verification and provide a foundation for future research. We release our code to support community progress in LLM empowered claim verification.
Problem

Research questions and friction points this paper is trying to address.

Improves claim verification accuracy through online reinforcement learning
Enhances evidence retrieval and reasoning via dynamic search engine interaction
Addresses limitations of prompt engineering with unified training paradigm
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online reinforcement learning framework for claim verification
Dynamic interaction between LLM and search engine
Reward signals shape planning, retrieval, reasoning behaviors
🔎 Similar Papers
No similar papers found.