Beyond Pixels: Semantic-aware Typographic Attack for Geo-Privacy Protection

📅 2025-11-16

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

Large Vision-Language Models (LVLMs) can infer geographic locations from user-shared images, posing severe privacy risks. Existing adversarial perturbation methods rely on strong image distortions, degrading visual quality and shareability. This paper proposes a semantic-aware, two-stage layout attack framework that appends adversarially optimized, misleading textual annotations *outside* the original image—without any pixel-level modification—to disrupt LVLMs’ geolocation reasoning. By integrating semantic deception strategies with in-depth analysis of LVLM comprehension mechanisms, our approach achieves the first “zero-pixel-modification” defense against geographic inference. Evaluated on three benchmark datasets against five state-of-the-art commercial LVLMs, it reduces average localization accuracy by over 40%, while preserving near-perfect visual fidelity—significantly outperforming existing baselines.

Technology Category

Application Category

📝 Abstract

Large Visual Language Models (LVLMs) now pose a serious yet overlooked privacy threat, as they can infer a social media user's geolocation directly from shared images, leading to unintended privacy leakage. While adversarial image perturbations provide a potential direction for geo-privacy protection, they require relatively strong distortions to be effective against LVLMs, which noticeably degrade visual quality and diminish an image's value for sharing. To overcome this limitation, we identify typographical attacks as a promising direction for protecting geo-privacy by adding text extension outside the visual content. We further investigate which textual semantics are effective in disrupting geolocation inference and design a two-stage, semantics-aware typographical attack that generates deceptive text to protect user privacy. Extensive experiments across three datasets demonstrate that our approach significantly reduces geolocation prediction accuracy of five state-of-the-art commercial LVLMs, establishing a practical and visually-preserving protection strategy against emerging geo-privacy threats.

Problem

Research questions and friction points this paper is trying to address.

Protecting geo-privacy from LVLM inference via images

Reducing adversarial distortion while preserving visual quality

Developing semantics-aware typographic attacks to mislead location prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses typographic attacks outside visual content

Generates deceptive text to disrupt geolocation inference

Implements two-stage semantics-aware attack strategy

🔎 Similar Papers

Typography Leads Semantic Diversifying: Amplifying Adversarial Transferability across Multimodal Large Language Models