🤖 AI Summary
This study addresses visual privacy risks in multimodal AI, revealing significant inconsistencies among current vision-language models (VLMs) in understanding and applying contextual privacy principles. To address this, we propose the first scalable, multi-level visual privacy taxonomy grounded in legal frameworks such as the GDPR, and establish fine-grained evaluation criteria. We further introduce VisPrivBench—a cross-scenario, multidimensional benchmark—to systematically assess VLMs’ privacy awareness across sensitive content detection, context-sensitive judgment, and compliance-aligned response generation. Empirical evaluation shows that state-of-the-art VLMs exhibit weak and highly unstable performance across all dimensions. Our work delivers the first standardized, privacy-specific evaluation suite for vision-language models and underscores both the urgency and feasibility of developing legally aligned, context-aware multimodal AI systems.
📝 Abstract
Artificial Intelligence have profoundly transformed the technological landscape in recent years. Large Language Models (LLMs) have demonstrated impressive abilities in reasoning, text comprehension, contextual pattern recognition, and integrating language with visual understanding. While these advances offer significant benefits, they also reveal critical limitations in the models' ability to grasp the notion of privacy. There is hence substantial interest in determining if and how these models can understand and enforce privacy principles, particularly given the lack of supporting resources to test such a task. In this work, we address these challenges by examining how legal frameworks can inform the capabilities of these emerging technologies. To this end, we introduce a comprehensive, multi-level Visual Privacy Taxonomy that captures a wide range of privacy issues, designed to be scalable and adaptable to existing and future research needs. Furthermore, we evaluate the capabilities of several state-of-the-art Vision-Language Models (VLMs), revealing significant inconsistencies in their understanding of contextual privacy. Our work contributes both a foundational taxonomy for future research and a critical benchmark of current model limitations, demonstrating the urgent need for more robust, privacy-aware AI systems.