From Pixels to Places: A Systematic Benchmark for Evaluating Image Geolocalization Ability in Large Language Models

📅 2025-08-03

📈 Citations: 0

✨ Influential: 0

career value

159K/year

🤖 AI Summary

This work systematically evaluates large language models’ (LLMs) capability in image geolocation. To this end, we introduce IMAGEO-Bench—the first LLM-specific benchmark for geolocation—comprising three diverse datasets: global street-level imagery, U.S. points of interest, and unseen images. It holistically measures localization accuracy, distance error, spatial bias, and vision-language reasoning. Our methodology integrates vision-language model inference, regression-based diagnostic analysis, and multi-source evaluation to quantify performance correlations with environmental attributes (e.g., urbanicity, outdoor visibility, landmark saliency). Experiments reveal that leading proprietary models outperform open ones; significant regional biases exist (higher accuracy in North America and Western Europe); and geolocation accuracy improves markedly in urban, outdoor, and landmark-rich scenes. This study establishes a reproducible, fine-grained evaluation framework for LLMs’ geographic understanding and delivers empirically grounded insights into their spatial reasoning capabilities.

Technology Category

Application Category

📝 Abstract

Image geolocalization, the task of identifying the geographic location depicted in an image, is important for applications in crisis response, digital forensics, and location-based intelligence. While recent advances in large language models (LLMs) offer new opportunities for visual reasoning, their ability to perform image geolocalization remains underexplored. In this study, we introduce a benchmark called IMAGEO-Bench that systematically evaluates accuracy, distance error, geospatial bias, and reasoning process. Our benchmark includes three diverse datasets covering global street scenes, points of interest (POIs) in the United States, and a private collection of unseen images. Through experiments on 10 state-of-the-art LLMs, including both open- and closed-source models, we reveal clear performance disparities, with closed-source models generally showing stronger reasoning. Importantly, we uncover geospatial biases as LLMs tend to perform better in high-resource regions (e.g., North America, Western Europe, and California) while exhibiting degraded performance in underrepresented areas. Regression diagnostics demonstrate that successful geolocalization is primarily dependent on recognizing urban settings, outdoor environments, street-level imagery, and identifiable landmarks. Overall, IMAGEO-Bench provides a rigorous lens into the spatial reasoning capabilities of LLMs and offers implications for building geolocation-aware AI systems.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' image geolocalization accuracy and biases

Assessing performance disparities in open vs closed-source LLMs

Identifying key factors for successful image geolocalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Benchmark evaluates LLMs' geolocalization accuracy

Tests models on diverse global and US datasets

Reveals geospatial biases in model performance

🔎 Similar Papers

GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model