Locatability-Guided Adaptive Reasoning for Image Geo-Localization with Vision-Language Models

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

Existing vision-language models for image geolocation are often constrained by fixed reasoning depths or low-quality retrieval corpora, leading to hallucinations and insufficient localization accuracy. To address these limitations, this work proposes Geo-ADAPT, an adaptive reasoning framework that introduces a novel localizability scoring mechanism to dynamically adjust reasoning depth. The authors construct Geo-ADAPT-51K, a hierarchical reasoning dataset, and integrate retrieval-augmented generation with group relative policy optimization (GRPO) guided by tailored rewards. This approach achieves state-of-the-art performance across multiple geolocation benchmarks, significantly reducing hallucination rates while simultaneously improving both localization accuracy and reasoning efficiency.

Technology Category

Application Category

📝 Abstract

The emergence of Vision-Language Models (VLMs) has introduced new paradigms for global image geo-localization through retrieval-augmented generation (RAG) and reasoning-driven inference. However, RAG methods are constrained by retrieval database quality, while reasoning-driven approaches fail to internalize image locatability, relying on inefficient, fixed-depth reasoning paths that increase hallucinations and degrade accuracy. To overcome these limitations, we introduce an Optimized Locatability Score that quantifies an image's suitability for deep reasoning in geo-localization. Using this metric, we curate Geo-ADAPT-51K, a locatability-stratified reasoning dataset enriched with augmented reasoning trajectories for complex visual scenes. Building on this foundation, we propose a two-stage Group Relative Policy Optimization (GRPO) curriculum with customized reward functions that regulate adaptive reasoning depth, visual grounding, and hierarchical geographical accuracy. Our framework, Geo-ADAPT, learns an adaptive reasoning policy, achieves state-of-the-art performance across multiple geo-localization benchmarks, and substantially reduces hallucinations by reasoning both adaptively and efficiently.

Problem

Research questions and friction points this paper is trying to address.

image geo-localization

vision-language models

locatability

reasoning depth

hallucination

Innovation

Methods, ideas, or system contributions that make the work stand out.

locatability

adaptive reasoning

vision-language models