Hierarchical Deep Fusion Framework for Multi-dimensional Facial Forgery Detection - The 2024 Global Deepfake Image Detection Challenge

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the digital authenticity crisis precipitated by the proliferation of deepfake images, this paper proposes a hierarchical deep fusion framework for multi-type facial forgery detection. Methodologically, it innovatively integrates four heterogeneous pre-trained architectures—Swin-MLP, CoAtNet, EfficientNetV2, and DaViT—via multi-stage fine-tuning and hierarchical feature concatenation to enable complementary representation learning, thereby substantially enhancing model generalization. Transfer learning and ensemble optimization are conducted on the MultiFFDI dataset. The resulting system achieves a score of 0.96852 on the competition’s private leaderboard, ranking 20th among 184 teams. To the best of our knowledge, this is the first work to systematically unify the architectural advantages of MLPs, CNNs, and Vision Transformers (ViTs) within a single detection framework, establishing a reproducible and efficient paradigm for cross-architecture feature collaboration.

Technology Category

Application Category

📝 Abstract
The proliferation of sophisticated deepfake technology poses significant challenges to digital security and authenticity. Detecting these forgeries, especially across a wide spectrum of manipulation techniques, requires robust and generalized models. This paper introduces the Hierarchical Deep Fusion Framework (HDFF), an ensemble-based deep learning architecture designed for high-performance facial forgery detection. Our framework integrates four diverse pre-trained sub-models, Swin-MLP, CoAtNet, EfficientNetV2, and DaViT, which are meticulously fine-tuned through a multi-stage process on the MultiFFDI dataset. By concatenating the feature representations from these specialized models and training a final classifier layer, HDFF effectively leverages their collective strengths. This approach achieved a final score of 0.96852 on the competition's private leaderboard, securing the 20th position out of 184 teams, demonstrating the efficacy of hierarchical fusion for complex image classification tasks.
Problem

Research questions and friction points this paper is trying to address.

Detects facial forgeries across manipulation techniques
Addresses digital security challenges from deepfake technology
Improves generalization in complex image classification tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Deep Fusion Framework ensemble architecture
Integrates four pre-trained models fine-tuned
Feature concatenation with final classifier layer
🔎 Similar Papers
No similar papers found.
K
Kohou Wang
AI Innovation Center, China Unicom
Huan Hu
Huan Hu
PhD student, Washington State University
analog& mixed signals IC design
X
Xiang Liu
AI Innovation Center, China Unicom
Z
Zezhou Chen
AI Innovation Center, China Unicom
P
Ping Chen
AI Innovation Center, China Unicom
Zhaoxiang Liu
Zhaoxiang Liu
China Unicom
Computer VisionDeep LearningRoboticsHuman-Computer Interaction
Shiguo Lian
Shiguo Lian
CloudMinds