Milestones over Outcome: Unlocking Geometric Reasoning with Sub-Goal Verifiable Reward

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the tendency of multimodal large language models to conflate coincidentally correct answers with rigorously derived reasoning in geometric tasks due to outcome-oriented supervision. To remedy this, the authors propose replacing conventional supervision with subgoal-level evaluation, introducing GeoGoal—the first verifiable benchmark for geometric reasoning—and a skeleton-rate-based dense reward mechanism, SGVR, to guide models toward learning formally verifiable reasoning paths. Notably, this study is the first to incorporate numerical subgoals generated by formal verification into model training, revealing a critical misalignment between reasoning quality and answer accuracy. Experimental results demonstrate that the proposed approach improves performance by 9.7% on geometric reasoning tasks and exhibits strong generalization, yielding gains of 8.0% on general mathematical tasks and 2.8% on other reasoning benchmarks.

Technology Category

Application Category

📝 Abstract
Multimodal Large Language Models (MLLMs) struggle with complex geometric reasoning, largely because"black box"outcome-based supervision fails to distinguish between lucky guesses and rigorous deduction. To address this, we introduce a paradigm shift towards subgoal-level evaluation and learning. We first construct GeoGoal, a benchmark synthesized via a rigorous formal verification data engine, which converts abstract proofs into verifiable numeric subgoals. This structure reveals a critical divergence between reasoning quality and outcome accuracy. Leveraging this, we propose the Sub-Goal Verifiable Reward (SGVR) framework, which replaces sparse signals with dense rewards based on the Skeleton Rate. Experiments demonstrate that SGVR not only enhances geometric performance (+9.7%) but also exhibits strong generalization, transferring gains to general math (+8.0%) and other general reasoning tasks (+2.8%), demonstrating broad applicability across diverse domains.
Problem

Research questions and friction points this paper is trying to address.

geometric reasoning
multimodal large language models
outcome-based supervision
reasoning quality
subgoal verification
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sub-Goal Verifiable Reward
Geometric Reasoning
Formal Verification
Multimodal Large Language Models
Dense Reward
🔎 Similar Papers
No similar papers found.
J
Jianlong Chen
The Chinese University of Hong Kong, Shenzhen
D
D. Fu
Fudan University
S
Shengze Xu
The Chinese University of Hong Kong
J
Jiawei Chen
University of Science and Technology Beijing
Y
Yuan Feng
Shanghai Jiao Tong University
Y
Yue Yang
Shanghai Jiao Tong University
Junchi Yan
Junchi Yan
FIAPR & ICML Board Member, SJTU (2018-), SII (2024-), AWS (2019-2022), IBM (2011-2018)
Computational IntelligenceAI4ScienceMachine LearningAutonomous Driving
Hongyuan Zha
Hongyuan Zha
The Chinese University of Hong Kong, Shenzhen
machine learning
Renqiu Xia
Renqiu Xia
SJTU
LLMVLM