π€ AI Summary
This work addresses the challenge of unified face attack detection, which requires simultaneous identification of both physical spoofing and digital forgeriesβa task where existing methods rely heavily on superficial appearance cues and lack evidence-based reasoning capabilities. To overcome this limitation, we introduce structured attack knowledge into the problem for the first time by constructing a Face Attack Knowledge Graph (FAKG) and propose a knowledge graph-driven multimodal reasoning framework. Our approach employs graph-guided question-answer data generation for instruction tuning (AGIT) and incorporates a Graph-Consistent Reasoning Optimization mechanism (GCRO) with a GRPO-based reinforcement learning objective to enhance reasoning consistency, interpretability, and generalization. Evaluated on a multimodal unified attack detection benchmark, our method significantly outperforms both discriminative baselines and general-purpose multimodal large models across binary, coarse-grained, and fine-grained protocols, achieving higher accuracy (ACC) and lower Half Total Error Rate (HTER).
π Abstract
Unified face attack detection (UAD) requires recognizing physical spoofing and digital forgery within a shared decision space, yet existing discriminative or prompt-based methods largely rely on appearance correlations and provide limited evidence-grounded reasoning. We propose UniShield, a knowledge-grounded multimodal reasoning framework for unified face attack defense. UniShield constructs a Face Attack Knowledge Graph (FAKG) that links attack categories to diagnostic visual cues and attack-conditioned relations, and uses it to synthesize 52,025 FAKG-QA examples for Attack-Graph Instruction Tuning (AGIT). To improve rationale consistency, we further introduce Graph-Consistent Reasoning Optimization (GCRO), a GRPO-based objective with a KG-consistency reward that encourages generated rationales to match graph-supported cues while penalizing incompatible claims. Experiments on our multimodal UAD benchmark show that UniShield achieves strong performance across binary, coarse-grained, and fine-grained protocols, with consistently high ACC and low HTER. These results suggest that structured attack knowledge can improve both detection accuracy and reasoning reliability over discriminative baselines and general-purpose MLLMs. Our code will be released at https://anonymous.4open.science/r/Unishield-A6A3/.