🤖 AI Summary
Traditional passive facial liveness detection struggles to distinguish presentation attacks such as printed photos, screen replay, 3D masks, and video replay. To address this limitation, this paper proposes a user-collaborative liveness detection method: users are instructed to slowly approach the camera while facing it directly, generating controlled 3D motion stimuli. The method jointly models spatiotemporal dynamics of facial volumetric changes using a dual-stream architecture—RGB frames and neural optical flow estimation. Crucially, it is the first to deeply integrate a collaborative interaction protocol with optical flow analysis to explicitly extract motion cues along the depth axis. Extensive experiments on multiple public benchmarks demonstrate significant improvements over state-of-the-art methods, achieving an average 32.7% reduction in Attack Classification Error Rate (ACER). The approach exhibits strong robustness and practical deployability.
📝 Abstract
In this work, we proposed a novel cooperative video-based face liveness detection method based on a new user interaction scenario where participants are instructed to slowly move their frontal-oriented face closer to the camera. This controlled approaching face protocol, combined with optical flow analysis, represents the core innovation of our approach. By designing a system where users follow this specific movement pattern, we enable robust extraction of facial volume information through neural optical flow estimation, significantly improving discrimination between genuine faces and various presentation attacks (including printed photos, screen displays, masks, and video replays). Our method processes both the predicted optical flows and RGB frames through a neural classifier, effectively leveraging spatial-temporal features for more reliable liveness detection compared to passive methods.