🤖 AI Summary
To address privacy leakage risks arising from image capture in visual language model (VLM)-assisted systems for blind and low-vision users—where coarse-grained masking severely degrades image utility—this paper proposes FiGPriv, a fine-grained privacy-preserving framework. FiGPriv first performs fine-grained image segmentation to precisely localize sensitive regions; second, it introduces a data-driven risk scoring model to quantify the privacy leakage risk of each region; and third, it applies a selective masking strategy that suppresses high-risk content while minimizing information loss. Evaluated on the newly constructed BIV-Priv-Seg dataset, FiGPriv achieves a 26% improvement in content retention rate, an 11% gain in VLM response usefulness, and a 45% increase in image content recognition accuracy over baseline methods. To our knowledge, this is the first work to jointly optimize privacy protection and functional utility for blind-assistance scenarios.
📝 Abstract
As visual assistant systems powered by visual language models (VLMs) become more prevalent, concerns over user privacy have grown, particularly for blind and low vision users who may unknowingly capture personal private information in their images. Existing privacy protection methods rely on coarse-grained segmentation, which uniformly masks entire private objects, often at the cost of usability. In this work, we propose FiGPriv, a fine-grained privacy protection framework that selectively masks only high-risk private information while preserving low-risk information. Our approach integrates fine-grained segmentation with a data-driven risk scoring mechanism. We evaluate our framework using the BIV-Priv-Seg dataset and show that FiG-Priv preserves +26% of image content, enhancing the ability of VLMs to provide useful responses by 11% and identify the image content by 45%, while ensuring privacy protection. Project Page: https://artcs1.github.io/VLMPrivacy/