Steering Vision-Language Pre-trained Models for Incremental Face Presentation Attack Detection

📅 2025-12-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In privacy-sensitive scenarios, face presentation attack detection (PAD) must continually adapt to emerging spoofing strategies and cross-domain distribution shifts, yet cannot retain historical data due to privacy constraints. To address this challenge, we propose SVLP-IL—a replay-free incremental learning framework built upon vision-language pre-trained (VLP) models. SVLP-IL introduces a novel multi-aspect prompting (MAP) mechanism to achieve task-adaptive semantic alignment across incremental PAD tasks, and integrates selective elastic weight consolidation (SEWC) to dynamically safeguard critical parameters, thereby balancing knowledge stability and plasticity. Evaluated on multiple PAD benchmarks, SVLP-IL significantly mitigates catastrophic forgetting and achieves an average 8.3% improvement in cross-domain detection accuracy. The framework enables long-term, privacy-compliant, and robust liveness deployment without requiring access to past training data.

Technology Category

Application Category

📝 Abstract
Face Presentation Attack Detection (PAD) demands incremental learning (IL) to combat evolving spoofing tactics and domains. Privacy regulations, however, forbid retaining past data, necessitating rehearsal-free IL (RF-IL). Vision-Language Pre-trained (VLP) models, with their prompt-tunable cross-modal representations, enable efficient adaptation to new spoofing styles and domains. Capitalizing on this strength, we propose extbf{SVLP-IL}, a VLP-based RF-IL framework that balances stability and plasticity via extit{Multi-Aspect Prompting} (MAP) and extit{Selective Elastic Weight Consolidation} (SEWC). MAP isolates domain dependencies, enhances distribution-shift sensitivity, and mitigates forgetting by jointly exploiting universal and domain-specific cues. SEWC selectively preserves critical weights from previous tasks, retaining essential knowledge while allowing flexibility for new adaptations. Comprehensive experiments across multiple PAD benchmarks show that SVLP-IL significantly reduces catastrophic forgetting and enhances performance on unseen domains. SVLP-IL offers a privacy-compliant, practical solution for robust lifelong PAD deployment in RF-IL settings.
Problem

Research questions and friction points this paper is trying to address.

Incremental learning for evolving face spoofing detection
Privacy-compliant rehearsal-free adaptation to new domains
Balancing stability and plasticity in vision-language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-Aspect Prompting isolates domain dependencies and cues
Selective Elastic Weight Consolidation preserves critical previous task weights
Vision-Language Pre-trained models enable adaptation via prompt tuning
H
Haoze Li
School of Computer Science, China University of Geosciences, Wuhan 430074, China
J
Jie Zhang
State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences (CAS), Beijing 100190, China, and also with the University of China Academy of Sciences, Beijing 100049, China
Guoying Zhao
Guoying Zhao
Academy Professor, IEEE Fellow, Professor of Computer Science and Engineering, University of Oulu
Affective ComputingArtificial IntelligenceComputer VisionPattern Recognition
Stephen Lin
Stephen Lin
Microsoft Research Asia
computer vision
Shiguang Shan
Shiguang Shan
Professor of Institute of Computing Technology, Chinese Academy of Sciences
Computer VisionPattern RecognitionMachine LearningFace Recognition