FAME: A Lightweight Spatio-Temporal Network for Model Attribution of Face-Swap Deepfakes

📅 2025-06-13

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work addresses model provenance for face-swapping deepfake videos—specifically, fine-grained attribution to the generative model used, rather than binary fake detection. We propose a lightweight spatiotemporal modeling framework featuring a novel CNN backbone that jointly integrates spatial-temporal dual attention and multi-scale feature embedding. Trained end-to-end, it efficiently captures model-specific artifacts with high fidelity. Our method achieves an optimal trade-off between accuracy and computational efficiency: on DFDM, FaceForensics++, and FakeAVCeleb benchmarks, it outperforms state-of-the-art approaches by 3.2–5.7% in average classification accuracy and achieves 2.1× faster inference, enabling real-time deployment for digital forensic applications.

Technology Category

Application Category

📝 Abstract

The widespread emergence of face-swap Deepfake videos poses growing risks to digital security, privacy, and media integrity, necessitating effective forensic tools for identifying the source of such manipulations. Although most prior research has focused primarily on binary Deepfake detection, the task of model attribution -- determining which generative model produced a given Deepfake -- remains underexplored. In this paper, we introduce FAME (Fake Attribution via Multilevel Embeddings), a lightweight and efficient spatio-temporal framework designed to capture subtle generative artifacts specific to different face-swap models. FAME integrates spatial and temporal attention mechanisms to improve attribution accuracy while remaining computationally efficient. We evaluate our model on three challenging and diverse datasets: Deepfake Detection and Manipulation (DFDM), FaceForensics++, and FakeAVCeleb. Results show that FAME consistently outperforms existing methods in both accuracy and runtime, highlighting its potential for deployment in real-world forensic and information security applications.

Problem

Research questions and friction points this paper is trying to address.

Identifying source models of face-swap Deepfake videos

Detecting subtle generative artifacts in manipulated videos

Improving model attribution accuracy and computational efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Lightweight spatio-temporal network for Deepfake attribution

Multilevel embeddings capture model-specific generative artifacts

Spatial and temporal attention mechanisms enhance accuracy

🔎 Similar Papers

CapST: An Enhanced and Lightweight Model Attribution Approach for Synthetic Videos