Hierarchical Deep Fusion Framework for Multi-dimensional Facial Forgery Detection - The 2024 Global Deepfake Image Detection Challenge

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

263K/year

🤖 AI Summary

To address the digital authenticity crisis precipitated by the proliferation of deepfake images, this paper proposes a hierarchical deep fusion framework for multi-type facial forgery detection. Methodologically, it innovatively integrates four heterogeneous pre-trained architectures—Swin-MLP, CoAtNet, EfficientNetV2, and DaViT—via multi-stage fine-tuning and hierarchical feature concatenation to enable complementary representation learning, thereby substantially enhancing model generalization. Transfer learning and ensemble optimization are conducted on the MultiFFDI dataset. The resulting system achieves a score of 0.96852 on the competition’s private leaderboard, ranking 20th among 184 teams. To the best of our knowledge, this is the first work to systematically unify the architectural advantages of MLPs, CNNs, and Vision Transformers (ViTs) within a single detection framework, establishing a reproducible and efficient paradigm for cross-architecture feature collaboration.

Technology Category

Application Category

📝 Abstract

The proliferation of sophisticated deepfake technology poses significant challenges to digital security and authenticity. Detecting these forgeries, especially across a wide spectrum of manipulation techniques, requires robust and generalized models. This paper introduces the Hierarchical Deep Fusion Framework (HDFF), an ensemble-based deep learning architecture designed for high-performance facial forgery detection. Our framework integrates four diverse pre-trained sub-models, Swin-MLP, CoAtNet, EfficientNetV2, and DaViT, which are meticulously fine-tuned through a multi-stage process on the MultiFFDI dataset. By concatenating the feature representations from these specialized models and training a final classifier layer, HDFF effectively leverages their collective strengths. This approach achieved a final score of 0.96852 on the competition's private leaderboard, securing the 20th position out of 184 teams, demonstrating the efficacy of hierarchical fusion for complex image classification tasks.

Problem

Research questions and friction points this paper is trying to address.

Detects facial forgeries across manipulation techniques

Addresses digital security challenges from deepfake technology

Improves generalization in complex image classification tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Deep Fusion Framework ensemble architecture

Integrates four pre-trained models fine-tuned

Feature concatenation with final classifier layer

🔎 Similar Papers

No similar papers found.