🤖 AI Summary
To address the growing risks of fraud and misinformation posed by proliferating deepfake conversational videos, this paper proposes a novel forensic method grounded in the physiological plausibility of facial biometric dynamics. Unlike conventional pattern-recognition approaches, our method pioneers the use of physiologically interpretable facial microstructural dynamics as discriminative cues. Specifically, we employ a deep neural network to jointly model multimodal physiological signals—including hemodynamic response and facial musculature motion—and integrate spatiotemporal consistency analysis to detect subtle anomalies inconsistent with human physiology. The method demonstrates strong generalization to unseen generative models, robustness against common post-processing (e.g., video compression) and “reforging” attacks, and achieves state-of-the-art detection accuracy on major large-scale benchmarks. By grounding forgery detection in biologically grounded principles, our approach significantly enhances both the reliability and practical applicability of deepfake video authentication.
📝 Abstract
The combination of highly realistic voice cloning, along with visually compelling avatar, face-swap, or lip-sync deepfake video generation, makes it relatively easy to create a video of anyone saying anything. Today, such deepfake impersonations are often used to power frauds, scams, and political disinformation. We propose a novel forensic machine learning technique for the detection of deepfake video impersonations that leverages unnatural patterns in facial biometrics. We evaluate this technique across a large dataset of deepfake techniques and impersonations, as well as assess its reliability to video laundering and its generalization to previously unseen video deepfake generators.