π€ AI Summary
Existing active geolocalization (AGL) methods rely on distance estimation, limiting generalization to unseen targets and environments and suffering from poor robustness. This paper proposes a curiosity-driven, distance-free supervised AGL framework. We introduce an intrinsic curiosity reward based on prediction error, eliminating dependence on explicit distance labels; further, we integrate reinforcement learning with lightweight environmental modeling to enable target-agnostic, context-aware active exploration. Experiments across four benchmark datasets demonstrate substantial improvements in cross-target and cross-environment generalization: the method achieves a 12.7% average accuracy gain on unseen target localization tasks and increases exploration path diversity by 31%, validating its efficacy and robustness.
π Abstract
Active Geo-localization (AGL) is the task of localizing a goal, represented in various modalities (e.g., aerial images, ground-level images, or text), within a predefined search area. Current methods approach AGL as a goal-reaching reinforcement learning (RL) problem with a distance-based reward. They localize the goal by implicitly learning to minimize the relative distance from it. However, when distance estimation becomes challenging or when encountering unseen targets and environments, the agent exhibits reduced robustness and generalization ability due to the less reliable exploration strategy learned during training. In this paper, we propose GeoExplorer, an AGL agent that incorporates curiosity-driven exploration through intrinsic rewards. Unlike distance-based rewards, our curiosity-driven reward is goal-agnostic, enabling robust, diverse, and contextually relevant exploration based on effective environment modeling. These capabilities have been proven through extensive experiments across four AGL benchmarks, demonstrating the effectiveness and generalization ability of GeoExplorer in diverse settings, particularly in localizing unfamiliar targets and environments.