VLMs as GeoGuessr Masters: Exceptional Performance, Hidden Biases, and Privacy Risks

📅 2025-02-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies systematic geographic biases and privacy risks in vision-language models (VLMs). Addressing a critical gap in prior research, we introduce GeoBench—the first benchmark dataset comprising 1,200 diverse images annotated with fine-grained geographic metadata—and conduct zero-shot geolocation evaluation across leading VLMs, including CLIP, Flamingo, LLaVA, and Qwen-VL. We uncover significant regional disparities: accuracy drops by 12.5% for underdeveloped regions, and pervasive over-prediction occurs (e.g., Australian images are frequently misclassified as Sydney). The best city-level localization accuracy is merely 53.8%, revealing severe geographic inequity. Furthermore, we formally characterize a novel class of user privacy risks arising from VLM-based geographic inference—such as unintended disclosure of sensitive location attributes. All code and data are publicly released to advance fairness and privacy research in geospatial AI.

Technology Category

Application Category

📝 Abstract
Visual-Language Models (VLMs) have shown remarkable performance across various tasks, particularly in recognizing geographic information from images. However, significant challenges remain, including biases and privacy concerns. To systematically address these issues in the context of geographic information recognition, we introduce a benchmark dataset consisting of 1,200 images paired with detailed geographic metadata. Evaluating four VLMs, we find that while these models demonstrate the ability to recognize geographic information from images, achieving up to $53.8%$ accuracy in city prediction, they exhibit significant regional biases. Specifically, performance is substantially higher for economically developed and densely populated regions compared to less developed ($-12.5%$) and sparsely populated ($-17.0%$) areas. Moreover, the models exhibit regional biases, frequently overpredicting certain locations; for instance, they consistently predict Sydney for images taken in Australia. The strong performance of VLMs also raises privacy concerns, particularly for users who share images online without the intent of being identified. Our code and dataset are publicly available at https://github.com/uscnlp-lime/FairLocator.
Problem

Research questions and friction points this paper is trying to address.

Addressing regional biases in geographic recognition
Evaluating privacy risks in image-based location identification
Developing a benchmark for VLM performance assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduce benchmark dataset for geographic recognition
Evaluate VLMs on geographic information accuracy
Address privacy concerns in VLM applications
🔎 Similar Papers
No similar papers found.
Jingyuan Huang
Jingyuan Huang
Rutgers Univeristy
LLM AgentsRecommender SystemsGraph Mining
J
Jen-tse Huang
University of Southern California
Z
Ziyi Liu
University of Southern California
X
Xiaoyuan Liu
Independent Researcher
W
Wenxuan Wang
University of California, Los Angeles
Jieyu Zhao
Jieyu Zhao
Assistant Professor at USC
Natural Language ProcessingMachine LearningFairness in AI