GeoVLM-R1: Reinforcement Fine-Tuning for Improved Remote Sensing Reasoning

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

To address poor model adaptability and oversimplified reward signals in multi-task remote sensing image reasoning, this paper proposes a task-aware reinforcement learning (RL) post-training framework. It introduces fine-grained, task-specific rewards—covering object detection, image captioning, change detection, and spatiotemporal analysis—into the RLHF pipeline of vision-language models (VLMs), jointly optimizing reward modeling and policy learning to enhance inference robustness and training stability. Evaluated on multiple benchmarks, our method significantly outperforms both general-purpose and domain-specific remote sensing VLMs, achieving state-of-the-art performance with strong cross-dataset generalization. Key contributions include: (i) the first multi-granularity reward design tailored to Earth observation tasks; (ii) a task-driven RL post-training paradigm for remote sensing VLMs; and (iii) a unified framework that simultaneously strengthens cross-task semantic understanding and logical reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Recent advances in reinforcement learning (RL) have delivered strong reasoning capabilities in natural image domains, yet their potential for Earth Observation (EO) remains largely unexplored. EO tasks introduce unique challenges, spanning referred object detection, image or region captioning, change detection, grounding, and temporal analysis, that demand task aware reasoning. We propose a novel post training framework that incorporates task aware rewards to enable effective adaptation of reasoning based RL models to diverse EO tasks. This training strategy enhances reasoning capabilities for remote sensing images, stabilizes optimization, and improves robustness. Extensive experiments across multiple EO benchmarks show consistent performance gains over state of the art generic and specialized vision language models. Code and models will be released publicly at https://mustansarfiaz.github.io/GeoVLM-R1/ .

Problem

Research questions and friction points this paper is trying to address.

Enhancing reasoning capabilities for remote sensing images

Adapting reinforcement learning models to Earth Observation tasks

Improving robustness and optimization stability in EO analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement fine-tuning for remote sensing reasoning

Task-aware rewards adapt models to Earth Observation

Stabilizes optimization and enhances image reasoning robustness

🔎 Similar Papers

GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model