PA-FAS: Towards Interpretable and Generalizable Multimodal Face Anti-Spoofing via Path-Augmented Reinforcement Learning

📅 2025-11-22

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

Multimodal face anti-spoofing (FAS) faces challenges including limited inference paths, misalignment between supervision signals and multi-task reasoning, weak cross-domain generalization, and insufficient interpretability. To address these, we propose a path-augmentation mechanism and an answer-shuffling strategy to explicitly expand multimodal inference pathways, mitigate signal mismatch between supervised fine-tuning and reinforcement learning, and suppress reliance on spurious shortcut features. Furthermore, we design a high-precision extended inference sequence construction module and a cross-modal verification mechanism to enhance robustness and consistency in multimodal fusion. Experiments demonstrate substantial improvements in inference accuracy and cross-domain generalization, achieving state-of-the-art performance across multiple benchmarks. To our knowledge, this is the first work to jointly optimize multimodal fusion capability, generalization, and interpretability within a unified framework.

Technology Category

Application Category

📝 Abstract

Face anti-spoofing (FAS) has recently advanced in multimodal fusion, cross-domain generalization, and interpretability. With large language models and reinforcement learning (RL), strategy-based training offers new opportunities to jointly model these aspects. However, multimodal reasoning is more complex than unimodal reasoning, requiring accurate feature representation and cross-modal verification while facing scarce, high-quality annotations, which makes direct application of RL sub-optimal. We identify two key limitations of supervised fine-tuning plus RL (SFT+RL) for multimodal FAS: (1) limited multimodal reasoning paths restrict the use of complementary modalities and shrink the exploration space after SFT, weakening the effect of RL; and (2) mismatched single-task supervision versus diverse reasoning paths causes reasoning confusion, where models may exploit shortcuts by mapping images directly to answers and ignoring the intended reasoning. To address this, we propose PA-FAS, which enhances reasoning paths by constructing high-quality extended reasoning sequences from limited annotations, enriching paths and relaxing exploration constraints. We further introduce an answer-shuffling mechanism during SFT to force comprehensive multimodal analysis instead of using superficial cues, thereby encouraging deeper reasoning and mitigating shortcut learning. PA-FAS significantly improves multimodal reasoning accuracy and cross-domain generalization, and better unifies multimodal fusion, generalization, and interpretability for trustworthy FAS.

Problem

Research questions and friction points this paper is trying to address.

Enhances multimodal reasoning paths for face anti-spoofing systems

Addresses limited exploration space and reasoning confusion in RL training

Mitigates shortcut learning to improve cross-domain generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Path-augmented reinforcement learning for multimodal FAS

Extended reasoning sequences from limited annotations

Answer-shuffling mechanism to prevent shortcut learning

🔎 Similar Papers

ID-Guard: A Universal Framework for Combating Facial Manipulation via Breaking Identification