🤖 AI Summary
Multi-hop claim verification faces the challenge of dynamic coupling between reasoning and search: verification requires multi-step logical inference, while each step depends on iterative retrieval of bridging facts—creating mutual constraints. This paper proposes a two-tier agent collaboration framework: a high-level reasoning agent generates interpretable verification chains and decomposed sub-questions; a low-level search agent dynamically retrieves evidence, rewrites queries, and filters supporting facts based on intermediate reasoning results. We explicitly model the bidirectional closed-loop mechanism—“reasoning-driven search” and “search-guided reasoning”—integrated with hierarchical architecture, reinforcement learning (with outcome-oriented rewards), and dynamic sub-question generation. Our approach achieves state-of-the-art performance on EX-FEVER and HOVER benchmarks, significantly improving both accuracy and interpretability. This work establishes a novel paradigm for multi-hop fact verification.
📝 Abstract
Multi-hop claim verification is inherently challenging, requiring multi-step reasoning to construct verification chains while iteratively searching for information to uncover hidden bridging facts. This process is fundamentally interleaved, as effective reasoning relies on dynamically retrieved evidence, while effective search demands reasoning to refine queries based on partial information. To achieve this, we propose Hierarchical Agent Reasoning and Information Search (HARIS), explicitly modeling the coordinated process of reasoning-driven searching and search-informed reasoning. HARIS consists of a high-level reasoning agent that focuses on constructing the main verification chain, generating factual questions when more information is needed, and a low-level search agent that iteratively retrieves more information, refining its search based on intermediate findings. This design allows each agent to specialize in its respective task, enhancing verification accuracy and interpretability. HARIS is trained using reinforcement learning with outcome-based rewards. Experimental results on the EX-FEVER and HOVER benchmarks demonstrate that HARIS achieves strong performance, greatly advancing multi-hop claim verification.