Exploring Task-Solving Paradigm for Generalized Cross-Domain Face Anti-Spoofing via Reinforcement Fine-Tuning

📅 2025-06-27

📈 Citations: 0

✨ Influential: 0

career value

245K/year

🤖 AI Summary

Existing face anti-spoofing methods suffer from overfitting to training data, resulting in poor generalization to unseen attacks and cross-domain scenarios, as well as limited interpretability. Method: This paper proposes a novel generalizable deepfake detection paradigm that shifts focus from memorizing spurious patterns to teaching multimodal large language models (MLLMs) *how to solve* the task. We introduce the first reinforcement fine-tuning framework for task solving, featuring dual reward mechanisms—class consistency and reasoning consistency—and employ GRPO optimization to distill interpretable, high-reward decision rules. Crucially, our method operates without textual annotations and autonomously outputs both authenticity predictions and human-readable reasoning justifications. Results: Our approach achieves state-of-the-art performance on cross-domain deepfake detection benchmarks, demonstrating substantial improvements in generalization to unseen attack types and target domains while providing transparent, traceable decision logic.

Technology Category

Application Category

📝 Abstract

Recently the emergence of novel presentation attacks has drawn increasing attention to face anti-spoofing. However, existing methods tend to memorize data patterns from the training set, resulting in poor generalization to unknown attack types across different scenarios and limited interpretability. To address these challenges, this paper presents a reinforcement fine-tuning-based face anti-spoofing method that stimulates the capabilities of multimodal large language models to think and learn how to solve the anti-spoofing task itself, rather than relying on the memorization of authenticity patterns. We design verifiable class consistent reward and reasoning consistent reward, and employ a GRPO-based optimization strategy to guide the model in exploring reasoning policies from multiple perspectives to maximize expected rewards. As a result, through iterative trial-and-error learning while retaining only high-reward trajectories, the model distills highly generalizable decision-making rules from the extensive solution space to effectively address cross-domain face anti-spoofing tasks. Extensive experimental results demonstrate that our method achieves state-of-the-art cross-domain generalization performance. It generalizes well to diverse unknown attack types in unseen target domains while providing interpretable reasoning for its authenticity decisions without requiring labor-intensive textual annotations for training.

Problem

Research questions and friction points this paper is trying to address.

Improve generalization in cross-domain face anti-spoofing

Reduce reliance on memorized data patterns

Enhance interpretability of anti-spoofing decisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement fine-tuning for face anti-spoofing

GRPO-based optimization for reward maximization

Iterative trial-and-error learning for generalization

🔎 Similar Papers

ID-Guard: A Universal Framework for Combating Facial Manipulation via Breaking Identification