LocationAgent: A Hierarchical Agent for Image Geolocation via Decoupling Strategy and Evidence from Parametric Knowledge

📅 2026-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the susceptibility of existing image geolocation methods to factual hallucination and limited generalization in open-world scenarios. To mitigate these issues, we propose LocationAgent, a novel framework featuring a hierarchical Reasoner-Executor-Recorder (RER) agent architecture that embeds explicit reasoning logic within the model while leveraging external tools to verify geographic evidence, thereby establishing a hypothesis-validation loop. The design incorporates role separation and context compression mechanisms, complemented by multi-source cue exploration tools, which collectively suppress error propagation in multi-step reasoning. We further introduce CCL-Bench, the first Chinese-language geolocation benchmark, on which our method achieves at least a 30% performance gain over current approaches under zero-shot settings, demonstrating its effectiveness and strong generalization capability.

Technology Category

Application Category

📝 Abstract
Image geolocation aims to infer capture locations based on visual content. Fundamentally, this constitutes a reasoning process composed of \textit{hypothesis-verification cycles}, requiring models to possess both geospatial reasoning capabilities and the ability to verify evidence against geographic facts. Existing methods typically internalize location knowledge and reasoning patterns into static memory via supervised training or trajectory-based reinforcement fine-tuning. Consequently, these methods are prone to factual hallucinations and generalization bottlenecks in open-world settings or scenarios requiring dynamic knowledge. To address these challenges, we propose a Hierarchical Localization Agent, called LocationAgent. Our core philosophy is to retain hierarchical reasoning logic within the model while offloading the verification of geographic evidence to external tools. To implement hierarchical reasoning, we design the RER architecture (Reasoner-Executor-Recorder), which employs role separation and context compression to prevent the drifting problem in multi-step reasoning. For evidence verification, we construct a suite of clue exploration tools that provide diverse evidence to support location reasoning. Furthermore, to address data leakage and the scarcity of Chinese data in existing datasets, we introduce CCL-Bench (China City Location Bench), an image geolocation benchmark encompassing various scene granularities and difficulty levels. Extensive experiments demonstrate that LocationAgent significantly outperforms existing methods by at least 30\% in zero-shot settings.
Problem

Research questions and friction points this paper is trying to address.

image geolocation
factual hallucination
geospatial reasoning
evidence verification
open-world generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

hierarchical agent
decoupling strategy
geolocation reasoning
external tool integration
RER architecture
🔎 Similar Papers
No similar papers found.
Q
Qiujun Li
Central South University
Z
Zijin Xiao
ByteDance (China)
X
Xulin Wang
ByteDance (China)
Z
Zhidan Ma
ByteDance (China)
C
Cheng Yang
Central South University
Haifeng Li
Haifeng Li
Central South University
GISRemote sensingMachine learningSparse represetationBrain Theory