Seeing Through Deepfakes: A Human-Inspired Framework for Multi-Face Detection

📅 2025-07-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing deepfake detection methods suffer significant performance degradation on multi-face videos, primarily due to their inability to model critical contextual cues inherent in social scenes. Inspired by human visual cognition, this work conducts systematic psychological experiments—first identifying four discriminative cues: scene motion coherence, facial appearance compatibility, inter-personal gaze alignment, and face-body consistency. Based on these findings, we propose an interpretable and generalizable multi-face deepfake detection framework that jointly models multimodal features and leverages large language models (LLMs) to generate human-readable decision rationales. Evaluated on standard benchmarks, our method achieves a 3.3% improvement in in-distribution accuracy, a 2.8% gain under realistic perturbations, and outperforms state-of-the-art approaches by 5.8% in cross-dataset generalization—demonstrating substantially enhanced robustness and decision transparency.

Technology Category

Application Category

📝 Abstract
Multi-face deepfake videos are becoming increasingly prevalent, often appearing in natural social settings that challenge existing detection methods. Most current approaches excel at single-face detection but struggle in multi-face scenarios, due to a lack of awareness of crucial contextual cues. In this work, we develop a novel approach that leverages human cognition to analyze and defend against multi-face deepfake videos. Through a series of human studies, we systematically examine how people detect deepfake faces in social settings. Our quantitative analysis reveals four key cues humans rely on: scene-motion coherence, inter-face appearance compatibility, interpersonal gaze alignment, and face-body consistency. Guided by these insights, we introduce extsf{HICOM}, a novel framework designed to detect every fake face in multi-face scenarios. Extensive experiments on benchmark datasets show that extsf{HICOM} improves average accuracy by 3.3% in in-dataset detection and 2.8% under real-world perturbations. Moreover, it outperforms existing methods by 5.8% on unseen datasets, demonstrating the generalization of human-inspired cues. extsf{HICOM} further enhances interpretability by incorporating an LLM to provide human-readable explanations, making detection results more transparent and convincing. Our work sheds light on involving human factors to enhance defense against deepfakes.
Problem

Research questions and friction points this paper is trying to address.

Detecting multi-face deepfakes in social settings
Improving accuracy in multi-face deepfake detection
Enhancing interpretability with human-inspired cues
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages human cognition for multi-face deepfake detection
Incorporates four key human-reliant cues for accuracy
Uses LLM for interpretable human-readable explanations
🔎 Similar Papers
No similar papers found.