🤖 AI Summary
Existing geolocation representation methods suffer from severe visual information loss due to suboptimal training strategies, degrading downstream task performance. This paper introduces Retrieval-Augmented Neural Radiance Fields (RA-NeRF), the first framework to theoretically analyze information loss in geographic embeddings under contrastive learning from an information-theoretic perspective. RA-NeRF proposes a retrieval-augmented paradigm that jointly estimates and reconstructs high-fidelity, multi-resolution geographic embeddings by retrieving and aggregating visual features across spatially distant locations. It unifies contrastive learning and regression objectives via multi-task joint optimization. Extensive experiments demonstrate state-of-the-art performance across diverse geolocation tasks: classification accuracy improves by up to 13.1%, and regression R² increases by 0.145. The code is publicly available on GitHub, and pre-trained models are released on Hugging Face.
📝 Abstract
The choice of representation for geographic location significantly impacts the accuracy of models for a broad range of geospatial tasks, including fine-grained species classification, population density estimation, and biome classification. Recent works like SatCLIP and GeoCLIP learn such representations by contrastively aligning geolocation with co-located images. While these methods work exceptionally well, in this paper, we posit that the current training strategies fail to fully capture the important visual features. We provide an information theoretic perspective on why the resulting embeddings from these methods discard crucial visual information that is important for many downstream tasks. To solve this problem, we propose a novel retrieval-augmented strategy called RANGE. We build our method on the intuition that the visual features of a location can be estimated by combining the visual features from multiple similar-looking locations. We evaluate our method across a wide variety of tasks. Our results show that RANGE outperforms the existing state-of-the-art models with significant margins in most tasks. We show gains of up to 13.1% on classification tasks and 0.145 $R^2$ on regression tasks. All our code will be released on GitHub. Our models will be released on HuggingFace.