Paired-Sampling Contrastive Framework for Joint Physical-Digital Face Attack Detection

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

Existing liveness detection systems typically model physical presentation attacks (e.g., printed photos, 3D masks) and digital forgeries (e.g., DeepFakes) separately, resulting in model redundancy, high inference latency, and poor robustness against hybrid attacks. This paper proposes the first unified cross-modal liveness detection framework, leveraging contrastive learning with an automatic pair-sampling mechanism to jointly model both attack types within a single architecture and learn modality-agnostic discriminative features. The method employs a lightweight network enabling end-to-end training in under one hour and efficient deployment requiring only 4.46 GFLOPs. Evaluated on the 6th Face Anti-Spoofing Challenge benchmark, it achieves a state-of-the-art average classification error rate of 2.10%. The framework significantly reduces system complexity while enhancing robustness against composite threats, offering a scalable and practical solution for real-world deployment.

Technology Category

Application Category

📝 Abstract

Modern face recognition systems remain vulnerable to spoofing attempts, including both physical presentation attacks and digital forgeries. Traditionally, these two attack vectors have been handled by separate models, each targeting its own artifacts and modalities. However, maintaining distinct detectors increases system complexity and inference latency and leaves systems exposed to combined attack vectors. We propose the Paired-Sampling Contrastive Framework, a unified training approach that leverages automatically matched pairs of genuine and attack selfies to learn modality-agnostic liveness cues. Evaluated on the 6th Face Anti-Spoofing Challenge Unified Physical-Digital Attack Detection benchmark, our method achieves an average classification error rate (ACER) of 2.10 percent, outperforming prior solutions. The framework is lightweight (4.46 GFLOPs) and trains in under one hour, making it practical for real-world deployment. Code and pretrained models are available at https://github.com/xPONYx/iccv2025_deepfake_challenge.

Problem

Research questions and friction points this paper is trying to address.

Detecting both physical and digital face spoofing attacks

Unifying separate detectors to reduce system complexity

Learning modality-agnostic liveness cues from genuine-attack pairs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Paired-Sampling Contrastive Framework for unified training

Leverages matched genuine-attack pairs for liveness cues

Lightweight model with efficient real-world deployment capability

🔎 Similar Papers

No similar papers found.