🤖 AI Summary
This study addresses the critical gap in privacy evaluation of vision-language models (VLMs) by incorporating the dynamic nature of individual online visibility. The authors propose the first PII safety assessment framework grounded in a continuous spectrum of online visibility, introducing PIIVisBench—a benchmark comprising 4,000 probes—and categorizing 200 subjects into four visibility tiers: high, medium, low, and zero. They systematically evaluate the refusal and conditional disclosure rates of 18 open-source VLMs under PII-related queries. Through large-scale probing, hierarchical visibility design, and adversarial techniques including rephrasing and jailbreaking prompts, the study reveals that models are significantly more prone to disclose information about highly visible individuals (disclosure rate drops from 9.10% to 5.34% as visibility decreases) and uncovers heterogeneity across model families, variations by PII type, and critical security vulnerabilities.
📝 Abstract
Vision Language Models (VLMs) are increasingly integrated into privacy-critical domains, yet existing evaluations of personally identifiable information (PII) leakage largely treat privacy as a static extraction task and ignore how a subject's online presence--the volume of their data available online--influences privacy alignment. We introduce PII-VisBench, a novel benchmark containing 4000 unique probes designed to evaluate VLM safety through the continuum of online presence. The benchmark stratifies 200 subjects into four visibility categories: high, medium, low, and zero--based on the extent and nature of their information available online. We evaluate 18 open-source VLMs (0.3B-32B) based on two key metrics: percentage of PII probing queries refused (Refusal Rate) and the fraction of non-refusal responses flagged for containing PII (Conditional PII Disclosure Rate). Across models, we observe a consistent pattern: refusals increase and PII disclosures decrease (9.10% high to 5.34% low) as subject visibility drops. We identify that models are more likely to disclose PII for high-visibility subjects, alongside substantial model-family heterogeneity and PII-type disparities. Finally, paraphrasing and jailbreak-style prompts expose attack and model-dependent failures, motivating visibility-aware safety evaluation and training interventions.