Paired-Sampling Contrastive Framework for Joint Physical-Digital Face Attack Detection

📅 2025-08-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing liveness detection systems typically model physical presentation attacks (e.g., printed photos, 3D masks) and digital forgeries (e.g., DeepFakes) separately, resulting in model redundancy, high inference latency, and poor robustness against hybrid attacks. This paper proposes the first unified cross-modal liveness detection framework, leveraging contrastive learning with an automatic pair-sampling mechanism to jointly model both attack types within a single architecture and learn modality-agnostic discriminative features. The method employs a lightweight network enabling end-to-end training in under one hour and efficient deployment requiring only 4.46 GFLOPs. Evaluated on the 6th Face Anti-Spoofing Challenge benchmark, it achieves a state-of-the-art average classification error rate of 2.10%. The framework significantly reduces system complexity while enhancing robustness against composite threats, offering a scalable and practical solution for real-world deployment.

Technology Category

Application Category

📝 Abstract
Modern face recognition systems remain vulnerable to spoofing attempts, including both physical presentation attacks and digital forgeries. Traditionally, these two attack vectors have been handled by separate models, each targeting its own artifacts and modalities. However, maintaining distinct detectors increases system complexity and inference latency and leaves systems exposed to combined attack vectors. We propose the Paired-Sampling Contrastive Framework, a unified training approach that leverages automatically matched pairs of genuine and attack selfies to learn modality-agnostic liveness cues. Evaluated on the 6th Face Anti-Spoofing Challenge Unified Physical-Digital Attack Detection benchmark, our method achieves an average classification error rate (ACER) of 2.10 percent, outperforming prior solutions. The framework is lightweight (4.46 GFLOPs) and trains in under one hour, making it practical for real-world deployment. Code and pretrained models are available at https://github.com/xPONYx/iccv2025_deepfake_challenge.
Problem

Research questions and friction points this paper is trying to address.

Detecting both physical and digital face spoofing attacks
Unifying separate detectors to reduce system complexity
Learning modality-agnostic liveness cues from genuine-attack pairs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Paired-Sampling Contrastive Framework for unified training
Leverages matched genuine-attack pairs for liveness cues
Lightweight model with efficient real-world deployment capability
🔎 Similar Papers
No similar papers found.
Andrei Balykin
Andrei Balykin
Research Engineer at ID R&D Inc.
Speech processingDeep LearningMachine Learning
A
Anvar Ganiev
IDRND
D
Denis Kondranin
IDRND
K
Kirill Polevoda
IDRND
N
Nikolai Liudkevich
IDRND
Artem Petrov
Artem Petrov
Palisade Research