GeoSolver: Scaling Test-Time Reasoning in Remote Sensing with Fine-Grained Process Supervision

📅 2026-03-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the susceptibility of existing vision-language models to visual hallucinations and the lack of fine-grained process supervision in multi-step reasoning over remote sensing imagery. To this end, we introduce Geo-PRM-2M, a large-scale dataset with process-level annotations, and propose GeoPRM, a token-level process reward model. We further design Process-Aware Tree-GRPO, a reinforcement learning algorithm that integrates entropy-guided Monte Carlo tree search with a visual hallucination injection mechanism to optimize verifiable reasoning trajectories. Our approach is the first to combine fine-grained process supervision with tree-structured reinforcement learning, enabling test-time reasoning expansion and demonstrating cross-model generalizability. The resulting GeoSolver-9B achieves state-of-the-art performance on multiple remote sensing benchmarks, while GeoPRM significantly enhances both reasoning fidelity and the general capabilities of off-the-shelf vision-language models.

Technology Category

Application Category

📝 Abstract
While Vision-Language Models (VLMs) have significantly advanced remote sensing interpretation, enabling them to perform complex, step-by-step reasoning remains highly challenging. Recent efforts to introduce Chain-of-Thought (CoT) reasoning to this domain have shown promise, yet ensuring the visual faithfulness of these intermediate steps remains a critical bottleneck. To address this, we introduce GeoSolver, a novel framework that transitions remote sensing reasoning toward verifiable, process-supervised reinforcement learning. We first construct Geo-PRM-2M, a large-scale, token-level process supervision dataset synthesized via entropy-guided Monte Carlo Tree Search (MCTS) and targeted visual hallucination injection. Building upon this dataset, we train GeoPRM, a token-level process reward model (PRM) that provides granular faithfulness feedback. To effectively leverage these verification signals, we propose Process-Aware Tree-GRPO, a reinforcement learning algorithm that integrates tree-structured exploration with a faithfulness-weighted reward mechanism to precisely assign credit to intermediate steps. Extensive experiments demonstrate that our resulting model, GeoSolver-9B, achieves state-of-the-art performance across diverse remote sensing benchmarks. Crucially, GeoPRM unlocks robust Test-Time Scaling (TTS). Serving as a universal geospatial verifier, it seamlessly scales the performance of GeoSolver-9B and directly enhances general-purpose VLMs, highlighting its remarkable cross-model generalization.
Problem

Research questions and friction points this paper is trying to address.

remote sensing
visual faithfulness
test-time reasoning
chain-of-thought
process supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Process Supervision
Test-Time Scaling
Vision-Language Models
Reinforcement Learning
Remote Sensing
🔎 Similar Papers
No similar papers found.
L
Lang Sun
College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University
R
Ronghao Fu
College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University
Z
Zhuoran Duan
College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University
Haoran Liu
Haoran Liu
Ph.D. Student, Department of Computer Science & Engineering, Texas A&M University
LLMsGraph/Geometric LearningAI for ScienceGenerative Models
X
Xueyan Liu
College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University
B
Bo Yang
College of Computer Science and Technology, Jilin University, Changchun 130012, China; Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University