GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor model adaptability and oversimplified reward signals in multi-task remote sensing image reasoning, this paper proposes a task-aware reinforcement learning (RL) post-training framework. It introduces fine-grained, task-specific rewards—covering object detection, image captioning, change detection, and spatiotemporal analysis—into the RLHF pipeline of vision-language models (VLMs), jointly optimizing reward modeling and policy learning to enhance inference robustness and training stability. Evaluated on multiple benchmarks, our method significantly outperforms both general-purpose and domain-specific remote sensing VLMs, achieving state-of-the-art performance with strong cross-dataset generalization. Key contributions include: (i) the first multi-granularity reward design tailored to Earth observation tasks; (ii) a task-driven RL post-training paradigm for remote sensing VLMs; and (iii) a unified framework that simultaneously strengthens cross-task semantic understanding and logical reasoning capabilities.

Technology Category

Application Category

📝 Abstract
Recent advances in reinforcement learning (RL) have delivered strong reasoning capabilities in natural image domains, yet their potential for Earth Observation (EO) remains largely unexplored. EO tasks introduce unique challenges, spanning referred object detection, image or region captioning, change detection, grounding, and temporal analysis, that demand task aware reasoning. We propose a novel post training framework that incorporates task aware rewards to enable effective adaptation of reasoning based RL models to diverse EO tasks. This training strategy enhances reasoning capabilities for remote sensing images, stabilizes optimization, and improves robustness. Extensive experiments across multiple EO benchmarks show consistent performance gains over state of the art generic and specialized vision language models. Code and models will be released publicly at https://mustansarfiaz.github.io/GeoVLM-R1/ .
Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning capabilities for remote sensing images
Adapting reinforcement learning models to Earth Observation tasks
Improving robustness and optimization stability in EO analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement fine-tuning for remote sensing reasoning
Task-aware rewards adapt models to Earth Observation
Stabilizes optimization and enhances image reasoning robustness
🔎 Similar Papers
No similar papers found.
Mustansar Fiaz
Mustansar Fiaz
IBM Research
Deep LearningMachine LearningComputer Vision
H
Hiyam Debary
IBM Research
P
Paolo Fraccaro
IBM Research
D
Danda Paudel
INSAIT
Luc Van Gool
Luc Van Gool
professor computer vision INSAIT Sofia University, em. KU Leuven, em. ETHZ, Toyota Lab TRACE
computer visionmachine learningAIautonomous carscultural heritage
F
Fahad Khan
MBZUAI, Linköping University
S
Salman Khan
MBZUAI, ANU Australia