GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards

📅 2026-05-19

📈 Citations: 0

✨ Influential: 0

career value

165K/year

🤖 AI Summary

This work addresses the high cost of human annotation in geospatial reasoning, stemming from the vast and combinatorially complex problem space. To overcome this challenge, the authors propose GeoX, a novel framework that integrates self-play with verifiable reward signals for the first time. GeoX generates spatial reasoning problems in the form of executable programs and jointly optimizes both question generation and answering capabilities across abductive, deductive, and inductive reasoning paradigms. A program verifier supplies reinforcement learning rewards, eliminating the need for large-scale human annotations. The approach substantially enhances the spatial logical reasoning abilities of vision-language models and establishes the first self-play-based benchmark for geospatial understanding. Experimental results demonstrate that GeoX outperforms baseline methods by an average of 5.5 points across multiple tasks, achieving performance comparable to or better than conventional approaches reliant on millions of human-annotated samples.

📝 Abstract

Geospatial reasoning requires solving image-grounded problems over the complex spatial structure of a scene. However, developing this capability is hindered by the cost of annotating a vast and combinatorial question space. We propose GeoX, a self-play framework that acquires spatial logic through executable programs that yield verifiable rewards, without relying on large-scale human-curated data Given a satellite or aerial image, our framework employs a single multimodal policy that proposes spatial problems as executable programs and solves them under three reasoning modes-abduction, deduction, and induction-over spatial primitives and an image understanding tool. A verifier executes each program to covert a reward signal that jointly optimizes the two roles via reinforcement learning. GeoX consistently improves its base VLMs by up to 5.5 points on average, matching or exceeding conventional baselines trained on millions of curated data. Along-side the proposed method, we release a benchmark for geospatial understanding accumulated through self-play.

Problem

Research questions and friction points this paper is trying to address.

geospatial reasoning

spatial logic

self-play

verifiable rewards

multimodal policy

Innovation

Methods, ideas, or system contributions that make the work stand out.

self-play

geospatial reasoning

verifiable rewards