🤖 AI Summary
This work addresses the privacy risks inherent in RGB-based action recognition, where conventional post-processing methods fail to protect sensitive information at the point of data capture. To overcome this limitation, we propose Lens Privacy Sealing (LPS), a hardware-level solution leveraging tunable laminated films to achieve irreversible, multi-layer random scattering directly at the sensor front-end, thereby enabling physical, acquisition-time privacy preservation. We introduce the P³AR benchmark dataset and develop MSPNet, a novel architecture incorporating inter-frame noise suppression (IFNS), cross-frame semantic aggregation (CFSA), and contrastive language–image pretraining. Experiments demonstrate that LPS not only substantially suppresses identity recognition and resists reconstruction attacks but also nearly doubles action recognition accuracy, establishing the first low-cost, physically grounded approach that effectively balances privacy and utility at the data acquisition stage.
📝 Abstract
RGB camera-based surveillance systems enable human action recognition for public safety and healthcare, yet raise serious privacy concerns. Existing methods rely on post-capture algorithms, which fail to protect privacy during data acquisition. We propose Lens Privacy Sealing (LPS), a simple hardware solution that physically obscures camera lenses with adjustable laminating film, providing pre-sensor privacy protection at minimal cost. Unlike software methods or expensive engineered optics, LPS achieves strong privacy through stochastic multi-layer scattering that is physically irreversible. We introduce the P$^3$AR dataset for privacy-preserving action recognition, featuring both large-scale replay-captured (P$^3$AR-NTU, 114K videos) and real-world collected (P$^3$AR-PKU) subsets with privacy attribute annotations. To handle video degradation from LPS, we propose MSPNet, a single-stage framework incorporating Inter-Frame Noise Suppressor (IFNS) and Cross-Frame Semantic Aggregator (CFSA), enhanced by contrastive language-image pre-training for robust semantic extraction. Extensive experiments demonstrate that MSPNet with IFNS and CFSA nearly doubles action recognition accuracy compared to baseline methods while suppressing identity recognition to low levels. Comprehensive validation shows LPS achieves a superior privacy-utility trade-off compared to state-of-the-art hardware methods, resists reconstruction attacks including PSF inversion and data-driven recovery, and generalizes robustly across optical configurations and challenging environments. Code is available at https://github.com/wangzy01/MSPNet.