Privacy Beyond Pixels: Latent Anonymization for Privacy-Preserving Video Understanding

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the risk of sensitive attribute leakage (e.g., identity, gender) from latent visual features in video foundation models, this paper proposes the first general-purpose anonymization paradigm tailored for video latent spaces. Our method requires no retraining of the backbone model; instead, it introduces a lightweight, plug-and-play anonymization adapter applied atop a frozen video encoder. Leveraging self-supervised privacy constraints, joint task optimization, and latent consistency loss, the framework achieves end-to-end feature sanitization. Evaluated on Kinetics400 and UCF101, it maintains near-baseline performance across diverse downstream tasks—including action recognition, temporal action detection, and anomaly detection—with accuracy drops under 1.2%. Privacy leakage is reduced by 35%, and gender classification bias is significantly mitigated. The approach thus delivers strong privacy guarantees, high task utility, and improved model fairness without architectural or training overhead.

Technology Category

Application Category

📝 Abstract
We introduce a novel formulation of visual privacy preservation for video foundation models that operates entirely in the latent space. While spatio-temporal features learned by foundation models have deepened general understanding of video content, sharing or storing these extracted visual features for downstream tasks inadvertently reveals sensitive personal information like skin color, gender, or clothing. Current privacy preservation methods focus on input-pixel-level anonymization, which requires retraining the entire utility video model and results in task-specific anonymization, making them unsuitable for recent video foundational models. To address these challenges, we introduce a lightweight Anonymizing Adapter Module (AAM) that removes private information from video features while retaining general task utility. AAM can be applied in a plug-and-play fashion to frozen video encoders, minimizing the computational burden of finetuning and re-extracting features. Our framework employs three newly designed training objectives: (1) a clip-level self-supervised privacy objective to reduce mutual information between static clips, (2) a co-training objective to retain utility across seen tasks, and (3) a latent consistency loss for generalization on unseen tasks. Our extensive evaluations demonstrate a significant 35% reduction in privacy leakage while maintaining near-baseline utility performance across various downstream tasks: Action Recognition (Kinetics400, UCF101, HMDB51), Temporal Action Detection (THUMOS14), and Anomaly Detection (UCF-Crime). We also provide an analysis on anonymization for sensitive temporal attribute recognition. Additionally, we propose new protocols for assessing gender bias in action recognition models, showing that our method effectively mitigates such biases and promotes more equitable video understanding.
Problem

Research questions and friction points this paper is trying to address.

Preserving visual privacy in video foundation models by latent space anonymization
Removing sensitive personal information from video features while retaining utility
Addressing privacy leakage and gender bias in video understanding tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent space anonymization for video foundation models
Lightweight plug-and-play adapter module for privacy preservation
Three novel training objectives for privacy-utility balance
🔎 Similar Papers
No similar papers found.