Doxing via the Lens: Revealing Privacy Leakage in Image Geolocation for Agentic Multi-Modal Large Reasoning Model

📅 2025-04-27

📈 Citations: 0

✨ Influential: 0

career value

218K/year

🤖 AI Summary

This work presents the first systematic evaluation of privacy leakage risks posed by embodied multimodal large language models (e.g., ChatGPT-o3) in image geolocation—specifically, the unintended disclosure of individuals’ precise locations and identities in real-world scenarios. Method: We construct the first benchmark dataset comprising 50 privacy-sensitive images and employ controlled visual prompting to identify critical localization cues (e.g., street layout, front-yard design). We further propose an explainable defense based on targeted occlusion of salient visual regions identified via interpretability analysis. Contribution/Results: Experiments show that ChatGPT-o3 achieves street-level geolocation accuracy (≤1 mile) for 60% of test samples; occluding model-identified key regions significantly degrades localization performance, demonstrating the vulnerability’s mitigability. This study uncovers a novel geographic privacy threat inherent in multimodal reasoning models and provides the first explainable, image-grounded defense framework for real-world photographic data.

Technology Category

Application Category

📝 Abstract

The increasing capabilities of agentic multi-modal large reasoning models, such as ChatGPT o3, have raised critical concerns regarding privacy leakage through inadvertent image geolocation. In this paper, we conduct the first systematic and controlled study on the potential privacy risks associated with visual reasoning abilities of ChatGPT o3. We manually collect and construct a dataset comprising 50 real-world images that feature individuals alongside privacy-relevant environmental elements, capturing realistic and sensitive scenarios for analysis. Our experimental evaluation reveals that ChatGPT o3 can predict user locations with high precision, achieving street-level accuracy (within one mile) in 60% of cases. Through analysis, we identify key visual cues, including street layout and front yard design, that significantly contribute to the model inference success. Additionally, targeted occlusion experiments demonstrate that masking critical features effectively mitigates geolocation accuracy, providing insights into potential defense mechanisms. Our findings highlight an urgent need for privacy-aware development for agentic multi-modal large reasoning models, particularly in applications involving private imagery.

Problem

Research questions and friction points this paper is trying to address.

Investigates privacy risks from image geolocation in multi-modal AI models

Evaluates ChatGPT's ability to pinpoint user locations from visual cues

Proposes defense mechanisms to mitigate geolocation privacy leaks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic study on ChatGPT o3 privacy risks

Dataset with real-world images for analysis

Targeted occlusion mitigates geolocation accuracy

🔎 Similar Papers

No similar papers found.