Texture, Shape and Order Matter: A New Transformer Design for Sequential DeepFake Detection

📅 2024-04-22
🏛️ IEEE Workshop/Winter Conference on Applications of Computer Vision
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing deepfake detection methods predominantly formulate sequential forgery recognition as an image-to-sequence task, relying on generic Transformer architectures that lack explicit modeling of the temporal characteristics of manipulation artifacts. This work proposes a novel sequence-level detection framework specifically designed for temporal modeling of forgery operations, departing from conventional paradigms. We reconstruct the Transformer across three dimensions—texture, shape, and ordering—by introducing: (i) a texture-aware branch and reverse-order prediction mechanism; (ii) diversity-aware pixel-difference attention; (iii) multi-source cross-attention; and (iv) shape-guided Gaussian mapping, collectively enabling explicit modeling of spatiotemporal dependencies and causal structures inherent in manipulations. Evaluated on multiple benchmarks, the proposed method significantly outperforms state-of-the-art approaches, achieving substantial gains in both sequential manipulation recognition accuracy and robustness.

Technology Category

Application Category

📝 Abstract
Sequential DeepFake detection is an emerging task that predicts the manipulation sequence in order. Existing methods typically formulate it as an image-to-sequence problem, employing conventional Transformer architectures. However, these methods lack dedicated design and consequently result in limited performance. As such, this paper describes a new Transformer design, called TSOM, by exploring three perspectives: Texture, Shape, and Order of Manipulations. Our method features four major improvements: we describe a new texture-aware branch that effectively captures subtle manipulation traces with a Diversiform Pixel Difference Attention module. Then we introduce a Multi-source Cross-attention module to seek deep correlations among spatial and sequential features, enabling effective modeling of complex manipulation traces. To further enhance the cross-attention, we describe a Shape-guided Gaussian mapping strategy, providing initial priors of the manipulation shape. Finally, observing that the subsequent manipulation in a sequence may influence traces left in the preceding one, we intriguingly invert the prediction order from forward to backward, leading to notable gains as expected. Extensive experimental results demonstrate that our method outperforms others by a large margin, highlighting the superiority of our method.
Problem

Research questions and friction points this paper is trying to address.

Detects manipulation sequence in DeepFake videos effectively
Improves Transformer design for texture, shape, and order analysis
Enhances trace detection with sequential contrastive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Texture-aware branch with Diversiform Pixel Difference Attention
Multi-source Cross-attention for spatial-sequential feature correlation
Shape-guided Gaussian mapping for manipulation shape priors
🔎 Similar Papers
No similar papers found.