🤖 AI Summary
This work addresses the challenge of facial expression recognition (FER) in open-set video scenarios without identity labels, where conventional methods risk leaking identity privacy and struggle to balance privacy preservation with recognition performance. The paper proposes the first two-stage, video-level FER framework that operates entirely without identity annotations: it first trains an identity suppression network leveraging intra- and inter-video knowledge priors to disentangle identity and expression features, then employs a denoising module to recover expression information. Key innovations include the first fully unsupervised identity-expression disentanglement for open-set video FER, a novel knowledge-prior-driven identity suppression mechanism, and a privacy robustness evaluation protocol that requires no identity labels. Experiments demonstrate that the method effectively safeguards identity privacy while achieving expression recognition accuracy comparable to identity-supervised baselines across three video datasets.
📝 Abstract
Facial expression recognition relies on facial data that inherently expose identity and thus raise significant privacy concerns. Current privacy-preserving methods typically fail in realistic open-set video settings where identities are unknown, and identity labels are unavailable. We propose a two-stage framework for video-based privacy-preserving FER in challenging open-set settings that requires no identity labels at any stage. To decouple privacy and utility, we first train an identity-suppression network using intra- and inter-video knowledge priors derived from real-world videos without identity labels. This network anonymizes identity while preserving expressive cues. A subsequent denoising module restores expression-related information and helps recover FER performance. Furthermore, we introduce a falsification-based validation method that uses recognition priors to rigorously evaluate privacy robustness without requiring annotated identity labels. Experiments on three video datasets demonstrate that our method effectively protects privacy while maintaining FER accuracy comparable to identity-supervised baselines.