🤖 AI Summary
Multimodal large language models (MLLMs) continue to suffer from visual hallucinations and logical inconsistencies in mathematical reasoning; conventional outcome-oriented supervision and existing process reward models (PRMs) fail to resolve these issues—particularly as PRMs often blindly validate incorrect assumptions without actively verifying visual grounding. Method: We propose an active verification framework featuring a tool-augmented, agent-based verification architecture that decouples verification from reasoning, incorporates an independent question-asking mechanism to mitigate confirmation bias, and integrates multimodal process reward modeling, external tool invocation (e.g., calculators, retrieval), policy planning training, and high-quality verification trajectory construction. Results: On VisualProcessBench, our 8B-parameter model surpasses Qwen2.5-72B and InternVL-78B in verification accuracy, achieves significantly improved robustness, and generates interpretable, traceable verification paths.
📝 Abstract
Multimodal Large Language Models (MLLMs) have achieved impressive performances in mathematical reasoning, yet they remain vulnerable to visual hallucinations and logical inconsistencies that standard outcome-based supervision fails to mitigate. While Process Reward Models (PRMs) promise step-by-step verification, current approaches typically operate as scalar scorers or generative critics that suffer from sycophancy, blindly validating the flawed hypotheses rather than grounding them in visual reality. To bridge this gap, we introduce TIM-PRM (Tool-Integrated Multimodal PRM), a novel agentic framework that transforms verification from a passive classification task into an active, tool-augmented investigation. TIM-PRM is trained to explicitly plan verification strategies and utilizes a mechanism of Independent Question Asking to query evidence via external tools, effectively decoupling verification from the reasoning context to eliminate confirmation bias. We instantiate this method by curating a high-quality dataset of tool-integrated verification trajectories. Extensive experiments on VisualProcessBench demonstrate that our 8B parameter model surpasses existing open-source multimodal PRMs, significantly outperforming much larger models like Qwen2.5-72B and InternVL-78B, while offering interpretable insights into the verification process.