Leveraging Intermediate Features of Vision Transformer for Face Anti-Spoofing

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

228K/year

🤖 AI Summary

Face anti-spoofing systems remain vulnerable to presentation attacks such as printed photos and screen replay. To address this, we propose a live face detection method leveraging intermediate-layer features of Vision Transformers (ViTs). Departing from the dominant paradigm that relies solely on final classification logits, we systematically exploit shallow- and mid-level ViT features—characterized by both local texture sensitivity and global structural discriminability—for fine-grained spoof detection. We introduce an intermediate-feature distillation loss and a confidence-scoring mechanism for liveness estimation. Additionally, we design face-specific data augmentation strategies: semantic-preserving facial region enhancement and adaptive patch-wise masking. Evaluated on OULU-NPU and SiW benchmarks, our method achieves state-of-the-art performance against print and replay attacks, with significant improvements in AUC and HTER metrics.

Technology Category

Application Category

📝 Abstract

Face recognition systems are designed to be robust against changes in head pose, illumination, and blurring during image capture. If a malicious person presents a face photo of the registered user, they may bypass the authentication process illegally. Such spoofing attacks need to be detected before face recognition. In this paper, we propose a spoofing attack detection method based on Vision Transformer (ViT) to detect minute differences between live and spoofed face images. The proposed method utilizes the intermediate features of ViT, which have a good balance between local and global features that are important for spoofing attack detection, for calculating loss in training and score in inference. The proposed method also introduces two data augmentation methods: face anti-spoofing data augmentation and patch-wise data augmentation, to improve the accuracy of spoofing attack detection. We demonstrate the effectiveness of the proposed method through experiments using the OULU-NPU and SiW datasets.

Problem

Research questions and friction points this paper is trying to address.

Detect spoofing attacks in face recognition systems

Utilize Vision Transformer intermediate features for detection

Improve accuracy with novel data augmentation methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Vision Transformer intermediate features

Introduces two data augmentation methods

Detects minute live vs spoofed differences

🔎 Similar Papers

G2V2former: Graph Guided Video Vision Transformer for Face Anti-Spoofing