Object-Level Explanations for Image Geolocation Models: a GeoGuessr use-case

📅 2026-04-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

202K/year
🤖 AI Summary
Existing attribution methods for image geolocation models struggle to reveal whether predictions rely on human-interpretable, object-level visual cues. This work proposes an object-centric analysis pipeline that first extracts salient regions from attribution maps such as Grad-CAM, then decomposes them into object-like elements using image segmentation. The predictive relevance of these elements is rigorously evaluated through crop-based deletion and insertion tests. This approach enables, for the first time, an object-level interpretation of attribution outcomes. Experiments across three benchmark datasets demonstrate that attribution-guided cropping preserves significantly more predictive information than random cropping, providing strong evidence that geolocation models indeed leverage localized, interpretable object-level cues in their decision-making process.
📝 Abstract
When humans play geolocation games such as GeoGuessr, they rely on concrete visual cues, such as road markings, vegetation, or architectural details, to infer where an image was captured. Whether image geolocation models rely on similar object-level evidence remains difficult to determine, as attribution methods like Grad-CAM typically highlight diffuse regions rather than coherent visual entities, making it difficult to link model predictions to specific objects or perceptible patterns. In this work, we propose an object-centric analysis pipeline to investigate the visual evidence used by geolocation models. Starting from attribution maps, we extract salient regions and segment them into object-like elements. We evaluate their predictive relevance through deletion and insertion tests, comparing attributionguided crops to randomly selected regions with similar coverage. Experiments on a three-country benchmark show that attribution-guided crops consistently retain more information for the model's prediction than random crops. These results suggest that attribution maps can be decomposed into interpretable, perceptible elements, providing a step toward object-level analysis of geolocation models.
Problem

Research questions and friction points this paper is trying to address.

image geolocation
object-level explanation
attribution methods
visual cues
model interpretability
Innovation

Methods, ideas, or system contributions that make the work stand out.

object-centric explanation
image geolocation
attribution maps
visual interpretability
GeoGuessr
🔎 Similar Papers
No similar papers found.