Zero-shot image privacy classification with Vision-Language Models

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Prior work directly adapts general-purpose vision-language models (VLMs) for image privacy classification but lacks systematic, fair zero-shot comparisons against specialized models, obscuring performance ceilings and modality-specific advantages. Method: We introduce the first zero-shot image privacy classification benchmark and propose task-aligned prompting strategies to rigorously evaluate three leading open-source VLMs against lightweight specialized vision models. Contribution/Results: Our evaluation spans accuracy, inference efficiency, and robustness to adversarial and natural perturbations (e.g., distortion, compression, occlusion). Results show that despite their larger parameter counts and slower inference, VLMs underperform specialized models in accuracy—yet their cross-modal representations confer significantly enhanced robustness to common image corruptions. This work elucidates the unique value and practical boundaries of multimodal models in privacy-sensitive applications.

Technology Category

Application Category

📝 Abstract

While specialized learning-based models have historically dominated image privacy prediction, the current literature increasingly favours adopting large Vision-Language Models (VLMs) designed for generic tasks. This trend risks overlooking the performance ceiling set by purpose-built models due to a lack of systematic evaluation. To address this problem, we establish a zero-shot benchmark for image privacy classification, enabling a fair comparison. We evaluate the top-3 open-source VLMs, according to a privacy benchmark, using task-aligned prompts and we contrast their performance, efficiency, and robustness against established vision-only and multi-modal methods. Counter-intuitively, our results show that VLMs, despite their resource-intensive nature in terms of high parameter count and slower inference, currently lag behind specialized, smaller models in privacy prediction accuracy. We also find that VLMs exhibit higher robustness to image perturbations.

Problem

Research questions and friction points this paper is trying to address.

Establishing zero-shot benchmark for image privacy classification

Comparing VLMs against specialized models for privacy prediction

Evaluating VLMs' accuracy, efficiency and robustness to perturbations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Established zero-shot benchmark for image privacy classification

Evaluated top open-source Vision-Language Models against specialized models

Found VLMs lag in accuracy but show higher robustness

🔎 Similar Papers

Privacy-Aware Visual Language Models