VPD-100K: Towards Generalizable and Fine-grained Visual Privacy Protection

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

Existing visual privacy protection datasets suffer from limited scale, coarse-grained annotations, and insufficient scene coverage, hindering robust privacy detection in complex real-world environments. To address these limitations, this work introduces VPD-100K, a large-scale, fine-grained visual privacy dataset encompassing four domains, 33 categories of sensitive information, and over 190,000 annotated instances, alongside the first multi-scenario fine-grained privacy taxonomy. The authors propose a lightweight module integrating frequency-domain attention and adaptive spectral gating to overcome the constraints of conventional spatial-domain approaches, further enhanced by long-tailed distribution modeling and specialized small-object detection strategies. Experimental results demonstrate that the proposed method significantly improves detection performance for sensitive content across diverse image and live-video benchmarks in challenging scenarios.

📝 Abstract

Privacy protection has become a critical requirement in the era of ubiquitous visual data sharing, imposing higher demands on efficient and robust privacy detection algorithms. However, current robust detection models are severely hindered by the lack of comprehensive datasets. Existing privacy-oriented datasets often suffer from limited scale, coarse-grained annotations, and narrow domain coverage, failing to capture the intricate details of sensitive information in realworld environments. To bridge this gap, we present a large-scale, fine-grained Visual Privacy Dataset (VPD-100K), designed to facilitate generalized privacy detection. We establish a holistic taxonomy comprising four primary domains: Human Presence, On-Screen Personally Identifiable Information (PII), Physical Identifiers, and Location Indicators, containing 100,000 images annotated with 33 fine-grained classes and over 190,000 object instances. Statistical analysis reveals that our dataset features long-tailed distributions, small object scales, and high visual complexity. These characteristics make the dataset particularly valuable for demanding, unconstrained applications such as live streaming, where actors frequently face unintentional, realtime information leakage. Furthermore, we design an effective frequency-enhanced lightweight module consisting of frequency-domain attention fusion and adaptive spectral gating mechanism that breaks the limitations of spatial pixel intensity to better capture the subtle details of sensitive information. Extensive experiments conducted on both diverse image and streaming videos benchmarks consistently demonstrate the effectiveness of our VPD-100K dataset and the wellcurated frequency mechanism. The code and dataset are available at https://vpd-100k.github.io/.

Problem

Research questions and friction points this paper is trying to address.

visual privacy protection

fine-grained dataset

privacy detection

generalizable model

PII recognition

Innovation

Methods, ideas, or system contributions that make the work stand out.

fine-grained visual privacy

large-scale dataset

frequency-domain attention