VerifiAgent: a Unified Verification Agent in Language Model Reasoning

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Large language models (LLMs) frequently produce unreliable reasoning outputs; existing verification methods suffer from poor generalizability, high computational overhead, and limited cross-task scalability. To address this, we propose VerifiAgent—the first unified, two-level verification agent: at the meta-level, it assesses response completeness and logical consistency; at the tool-level, it adaptively invokes specialized verifiers—such as formal provers, logical checkers, or commonsense knowledge bases—based on the inferred reasoning type (mathematical, logical, or commonsense). Our key innovations include: (1) a reasoning-type-aware mechanism for autonomous tool selection, and (2) verification-feedback-driven reasoning refinement coupled with a low-cost scaling strategy. Experiments across diverse reasoning benchmarks demonstrate that VerifiAgent significantly outperforms baselines—including deductive verifiers and backward verifiers—in accuracy and robustness. It improves the base model’s performance and, in mathematical reasoning, achieves superior results with fewer samples compared to state-of-the-art process reward models.

Technology Category

Application Category

📝 Abstract

Large language models demonstrate remarkable reasoning capabilities but often produce unreliable or incorrect responses. Existing verification methods are typically model-specific or domain-restricted, requiring significant computational resources and lacking scalability across diverse reasoning tasks. To address these limitations, we propose VerifiAgent, a unified verification agent that integrates two levels of verification: meta-verification, which assesses completeness and consistency in model responses, and tool-based adaptive verification, where VerifiAgent autonomously selects appropriate verification tools based on the reasoning type, including mathematical, logical, or commonsense reasoning. This adaptive approach ensures both efficiency and robustness across different verification scenarios. Experimental results show that VerifiAgent outperforms baseline verification methods (e.g., deductive verifier, backward verifier) among all reasoning tasks. Additionally, it can further enhance reasoning accuracy by leveraging feedback from verification results. VerifiAgent can also be effectively applied to inference scaling, achieving better results with fewer generated samples and costs compared to existing process reward models in the mathematical reasoning domain. Code is available at https://github.com/Jiuzhouh/VerifiAgent

Problem

Research questions and friction points this paper is trying to address.

Improves reliability of language model reasoning outputs

Unifies verification across diverse reasoning tasks

Reduces computational costs while enhancing accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified verification agent for diverse reasoning tasks

Meta-verification assesses response completeness and consistency

Tool-based adaptive verification selects optimal verification tools

🔎 Similar Papers

System for systematic literature review using multiple AI agents: Concept and an empirical evaluation