Enhancing Learnable Descriptive Convolutional Vision Transformer for Face Anti-Spoofing

📅 2025-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address insufficient discriminability between live and spoof features and poor cross-domain generalization in facial anti-spoofing (FAS), this paper proposes three novel training strategies to enhance the feature representation capability of the Learnable Descriptive Convolutional Vision Transformer (LDCformer): (1) dual-attention supervision, jointly optimizing region-level and channel-level attention maps; (2) self-challenging supervision, generating hard samples via adversarial data augmentation to improve robustness; and (3) transitional triplet mining, dynamically constructing cross-domain hard triplets to strengthen fine-grained discrimination. This work is the first to jointly achieve fine-grained modeling of local descriptive features and domain-generalization optimization in FAS. Extensive experiments on mainstream benchmarks—including OULU-NPU and CASIA-MFSD—demonstrate significant improvements over state-of-the-art methods, with substantial gains in both feature discriminability and cross-domain generalization performance.

Technology Category

Application Category

📝 Abstract
Face anti-spoofing (FAS) heavily relies on identifying live/spoof discriminative features to counter face presentation attacks. Recently, we proposed LDCformer to successfully incorporate the Learnable Descriptive Convolution (LDC) into ViT, to model long-range dependency of locally descriptive features for FAS. In this paper, we propose three novel training strategies to effectively enhance the training of LDCformer to largely boost its feature characterization capability. The first strategy, dual-attention supervision, is developed to learn fine-grained liveness features guided by regional live/spoof attentions. The second strategy, self-challenging supervision, is designed to enhance the discriminability of the features by generating challenging training data. In addition, we propose a third training strategy, transitional triplet mining strategy, through narrowing the cross-domain gap while maintaining the transitional relationship between live and spoof features, to enlarge the domain-generalization capability of LDCformer. Extensive experiments show that LDCformer under joint supervision of the three novel training strategies outperforms previous methods.
Problem

Research questions and friction points this paper is trying to address.

Enhancing face anti-spoofing via discriminative feature learning
Improving LDCformer training with novel supervision strategies
Boosting domain-generalization in FAS with transitional feature mining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Incorporates Learnable Descriptive Convolution into ViT
Uses dual-attention supervision for fine-grained features
Applies transitional triplet mining for domain-generalization
🔎 Similar Papers
No similar papers found.
P
Pei-Kai Huang
Department of Computer Science, National Tsing Hua University, Kuang-Fu Road, Hsinchu, 30013, Taiwan
Jun-Xiong Chong
Jun-Xiong Chong
Master student of Computer Science, National Tsing Hua University
Computer VisionFace Anti spoofingVision Transformer
M
Ming-Tsung Hsu
Department of Computer Science, National Tsing Hua University, Kuang-Fu Road, Hsinchu, 30013, Taiwan
F
Fang-Yu Hsu
Department of Computer Science, National Tsing Hua University, Kuang-Fu Road, Hsinchu, 30013, Taiwan
Chiou-Ting Hsu
Chiou-Ting Hsu
National Tsing Hua University
Image Analysis and ProcessingComputer Vision