🤖 AI Summary
Social images often contain latent geographical privacy risks, yet existing vision-language models (LVLMs) lack task-specific optimization for geolocation inference, limiting their potential and hindering accurate risk assessment. To address this, we propose a human-inspired reasoning-chain-based LVLM agent framework that integrates visual reverse search, external knowledge retrieval, and an adaptive multi-step reasoning mechanism—enabling dynamic strategy adjustment and tool invocation. Compared to baseline models, our approach achieves a 11.1% improvement in country-level localization accuracy, a 5.2% gain in fine-grained (e.g., city-level) localization accuracy, and a 50.6% reduction in unknown-prediction rate. These results demonstrate significantly enhanced robustness and practicality. Moreover, the framework provides interpretable, scalable, and modular reasoning—establishing a novel paradigm for image-based geographical privacy risk assessment.
📝 Abstract
Images shared on social media often expose geographic cues. While early geolocation methods required expert effort and lacked generalization, the rise of Large Vision Language Models (LVLMs) now enables accurate geolocation even for ordinary users. However, existing approaches are not optimized for this task. To explore the full potential and associated privacy risks, we present Geo-Detective, an agent that mimics human reasoning and tool use for image geolocation inference. It follows a procedure with four steps that adaptively selects strategies based on image difficulty and is equipped with specialized tools such as visual reverse search, which emulates how humans gather external geographic clues. Experimental results show that GEO-Detective outperforms baseline large vision language models (LVLMs) overall, particularly on images lacking visible geographic features. In country level geolocation tasks, it achieves an improvement of over 11.1% compared to baseline LLMs, and even at finer grained levels, it still provides around a 5.2% performance gain. Meanwhile, when equipped with external clues, GEO-Detective becomes more likely to produce accurate predictions, reducing the "unknown" prediction rate by more than 50.6%. We further explore multiple defense strategies and find that Geo-Detective exhibits stronger robustness, highlighting the need for more effective privacy safeguards.