Exploring Task-Solving Paradigm for Generalized Cross-Domain Face Anti-Spoofing via Reinforcement Fine-Tuning

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing face anti-spoofing methods suffer from overfitting to training data, resulting in poor generalization to unseen attacks and cross-domain scenarios, as well as limited interpretability. Method: This paper proposes a novel generalizable deepfake detection paradigm that shifts focus from memorizing spurious patterns to teaching multimodal large language models (MLLMs) *how to solve* the task. We introduce the first reinforcement fine-tuning framework for task solving, featuring dual reward mechanisms—class consistency and reasoning consistency—and employ GRPO optimization to distill interpretable, high-reward decision rules. Crucially, our method operates without textual annotations and autonomously outputs both authenticity predictions and human-readable reasoning justifications. Results: Our approach achieves state-of-the-art performance on cross-domain deepfake detection benchmarks, demonstrating substantial improvements in generalization to unseen attack types and target domains while providing transparent, traceable decision logic.

Technology Category

Application Category

📝 Abstract
Recently the emergence of novel presentation attacks has drawn increasing attention to face anti-spoofing. However, existing methods tend to memorize data patterns from the training set, resulting in poor generalization to unknown attack types across different scenarios and limited interpretability. To address these challenges, this paper presents a reinforcement fine-tuning-based face anti-spoofing method that stimulates the capabilities of multimodal large language models to think and learn how to solve the anti-spoofing task itself, rather than relying on the memorization of authenticity patterns. We design verifiable class consistent reward and reasoning consistent reward, and employ a GRPO-based optimization strategy to guide the model in exploring reasoning policies from multiple perspectives to maximize expected rewards. As a result, through iterative trial-and-error learning while retaining only high-reward trajectories, the model distills highly generalizable decision-making rules from the extensive solution space to effectively address cross-domain face anti-spoofing tasks. Extensive experimental results demonstrate that our method achieves state-of-the-art cross-domain generalization performance. It generalizes well to diverse unknown attack types in unseen target domains while providing interpretable reasoning for its authenticity decisions without requiring labor-intensive textual annotations for training.
Problem

Research questions and friction points this paper is trying to address.

Improve generalization in cross-domain face anti-spoofing
Reduce reliance on memorized data patterns
Enhance interpretability of anti-spoofing decisions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement fine-tuning for face anti-spoofing
GRPO-based optimization for reward maximization
Iterative trial-and-error learning for generalization
Fangling Jiang
Fangling Jiang
University of South China
face anti-spoofing;computer vision;
Q
Qi Li
New Laboratory of Pattern Recognition, MAIS, CASIA, Beijing, China; School of Artificial Intelligence, UCAS, Beijing, China
W
Weining Wang
The Laboratory of Cognition and Decision Intelligence for Complex Systems, CASIA, Beijing, China
G
Gang Wang
The Laboratory of Cognition and Decision Intelligence for Complex Systems, CASIA, Beijing, China
B
Bing Liu
School of Computer Science, University of South China, Hengyang, China
Zhenan Sun
Zhenan Sun
Institute of Automation, Chinese Academy of Sciences
BiometricsPattern RecognitionComputer Vision