Rethinking Cross-Domain Evaluation for Face Forgery Detection with Semantic Fine-grained Alignment and Mixture-of-Experts

📅 2026-04-23
📈 Citations: 0
Influential: 0
📄 PDF

career value

217K/year
🤖 AI Summary
Existing face forgery detection methods suffer from limited generalization in cross-dataset scenarios, and the conventional AUC metric inadequately captures the consistency of detection scores across domains. To address these issues, this work introduces Cross-AUC, a novel evaluation metric designed to quantify cross-domain detection performance, and proposes the Semantic-Facial Alignment Mixture-of-Experts model (SFAM). SFAM leverages CLIP-driven fine-grained image-text alignment combined with a facial-region-aware mixture-of-experts mechanism to more accurately model subtle forgery artifacts. Extensive experiments demonstrate that SFAM significantly outperforms state-of-the-art methods across multiple benchmark datasets, consistently achieving superior cross-domain robustness and detection accuracy under various evaluation metrics.

Technology Category

Application Category

📝 Abstract
Nowadays, visual data forgery detection plays an increasingly important role in social and economic security with the rapid development of generative models. Existing face forgery detectors still can't achieve satisfactory performance because of poor generalization ability across datasets. The key factor that led to this phenomenon is the lack of suitable metrics: the commonly used cross-dataset AUC metric fails to reveal an important issue where detection scores may shift significantly across data domains. To explicitly evaluate cross-domain score comparability, we propose \textbf{Cross-AUC}, an evaluation metric that can compute AUC across dataset pairs by contrasting real samples from one dataset with fake samples from another (and vice versa). It is interesting to find that evaluating representative detectors under the Cross-AUC metric reveals substantial performance drops, exposing an overlooked robustness problem. Besides, we also propose the novel framework \textbf{S}emantic \textbf{F}ine-grained \textbf{A}lignment and \textbf{M}ixture-of-Experts (\textbf{SFAM}), consisting of a patch-level image-text alignment module that enhances CLIP's sensitivity to manipulation artifacts, and the facial region mixture-of-experts module, which routes features from different facial regions to specialized experts for region-aware forgery analysis. Extensive qualitative and quantitative experiments on the public datasets prove that the proposed method achieves superior performance compared with the state-of-the-art methods with various suitable metrics.
Problem

Research questions and friction points this paper is trying to address.

face forgery detection
cross-domain evaluation
generalization
evaluation metric
dataset shift
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-AUC
SFAM
face forgery detection
cross-domain evaluation
Mixture-of-Experts