SLIP: Spoof-Aware One-Class Face Anti-Spoofing with Language Image Pretraining

📅 2025-03-25

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

Single-class face anti-spoofing (FAS) suffers from poor cross-domain generalization because models trained solely on bona fide samples often erroneously rely on domain-specific cues—such as facial content—rather than genuine liveness indicators. To address this, we propose the first CLIP-based single-class FAS framework, comprising three novel components: (1) language-guided fake cue map estimation, (2) prompt-driven liveness feature disentanglement, and (3) latent-space fusion for pseudo-fake feature augmentation. This design enables explicit modeling of spoofing cues, robust separation of liveness-relevant features from spurious correlations, and generation of diverse anomaly representations. Our method achieves state-of-the-art performance across multiple cross-domain FAS benchmarks. Ablation studies validate the efficacy of each module, demonstrating strong generalization and robustness to domain shifts.

Technology Category

Application Category

📝 Abstract

Face anti-spoofing (FAS) plays a pivotal role in ensuring the security and reliability of face recognition systems. With advancements in vision-language pretrained (VLP) models, recent two-class FAS techniques have leveraged the advantages of using VLP guidance, while this potential remains unexplored in one-class FAS methods. The one-class FAS focuses on learning intrinsic liveness features solely from live training images to differentiate between live and spoof faces. However, the lack of spoof training data can lead one-class FAS models to inadvertently incorporate domain information irrelevant to the live/spoof distinction (e.g., facial content), causing performance degradation when tested with a new application domain. To address this issue, we propose a novel framework called Spoof-aware one-class face anti-spoofing with Language Image Pretraining (SLIP). Given that live faces should ideally not be obscured by any spoof-attack-related objects (e.g., paper, or masks) and are assumed to yield zero spoof cue maps, we first propose an effective language-guided spoof cue map estimation to enhance one-class FAS models by simulating whether the underlying faces are covered by attack-related objects and generating corresponding nonzero spoof cue maps. Next, we introduce a novel prompt-driven liveness feature disentanglement to alleviate live/spoof-irrelative domain variations by disentangling live/spoof-relevant and domain-dependent information. Finally, we design an effective augmentation strategy by fusing latent features from live images and spoof prompts to generate spoof-like image features and thus diversify latent spoof features to facilitate the learning of one-class FAS. Our extensive experiments and ablation studies support that SLIP consistently outperforms previous one-class FAS methods.

Problem

Research questions and friction points this paper is trying to address.

Enhancing one-class face anti-spoofing using vision-language pretraining

Reducing domain-irrelevant features in live/spoof classification

Generating diverse spoof features to improve model robustness

Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-guided spoof cue map estimation

Prompt-driven liveness feature disentanglement

Augmentation by fusing live and spoof features

🔎 Similar Papers

Standing on the Shoulders of Giants: Reprogramming Visual-Language Model for General Deepfake Detection