🤖 AI Summary
Vision-language models (VLMs) exhibit poor recognition accuracy for visual privacy content—such as passports and fingerprints—and existing evaluation datasets suffer from inconsistent labeling, hindering rigorous privacy-safety assessment and optimization.
Method: We introduce PrivBench, the first benchmark dedicated to visual privacy understanding, and construct PrivTune, a lightweight instruction-tuning dataset. Leveraging models like TinyLLaVa and MiniGPT-v2, we propose a privacy-aware few-shot instruction-tuning paradigm that preserves general vision-language capabilities (e.g., VQA) while enhancing privacy-sensitive image recognition.
Contribution/Results: Our approach achieves state-of-the-art performance on PrivBench—surpassing GPT-4V—without degrading general-purpose capabilities. This work establishes the first systematic evaluation standard and efficient adaptation framework for visual privacy in VLMs, providing foundational support for privacy-aware VLM research.
📝 Abstract
This paper aims to advance our understanding of how Visual Language Models (VLMs) handle privacy-sensitive information, a crucial concern as these technologies become integral to everyday life. To this end, we introduce a new benchmark PrivBench, which contains images from 8 sensitive categories such as passports, or fingerprints. We evaluate 10 state-of-the-art VLMs on this benchmark and observe a generally limited understanding of privacy, highlighting a significant area for model improvement. Based on this we introduce PrivTune, a new instruction-tuning dataset aimed at equipping VLMs with knowledge about visual privacy. By tuning two pretrained VLMs, TinyLLaVa and MiniGPT-v2, on this small dataset, we achieve strong gains in their ability to recognize sensitive content, outperforming even GPT4-V. At the same time, we show that privacy-tuning only minimally affects the VLMs performance on standard benchmarks such as VQA. Overall, this paper lays out a crucial challenge for making VLMs effective in handling real-world data safely and provides a simple recipe that takes the first step towards building privacy-aware VLMs.