🤖 AI Summary
Existing video forensic methods typically target a single manipulation type (e.g., deepfakes or inpainting), rendering them inadequate for real-world scenarios where manipulation types are unknown and often co-occur. This paper introduces the first end-to-end, multi-purpose video forensic network capable of jointly detecting diverse manipulations—including deepfakes, inpainting, splicing, and editing—without prior knowledge of the manipulation type. Our method features a novel multi-scale hierarchical Transformer module that jointly models spatiotemporal anomalies and precisely localizes forged regions of arbitrary shape and size across scales. Additionally, it integrates multimodal forensic cues with multi-scale spatiotemporal features. Evaluated on a comprehensive multi-manipulation benchmark, our approach achieves state-of-the-art performance, while also matching or surpassing specialized detectors on single-type manipulation tasks—demonstrating significantly improved generalization and practical applicability.
📝 Abstract
While videos can be falsified in many different ways, most existing forensic networks are specialized to detect only a single manipulation type (e.g. deepfake, inpainting). This poses a significant issue as the manipulation used to falsify a video is not known a priori. To address this problem, we propose MVFNet - a multipurpose video forensics network capable of detecting multiple types of manipulations including inpainting, deepfakes, splicing, and editing. Our network does this by extracting and jointly analyzing a broad set of forensic feature modalities that capture both spatial and temporal anomalies in falsified videos. To reliably detect and localize fake content of all shapes and sizes, our network employs a novel Multi-Scale Hierarchical Transformer module to identify forensic inconsistencies across multiple spatial scales. Experimental results show that our network obtains state-of-the-art performance in general scenarios where multiple different manipulations are possible, and rivals specialized detectors in targeted scenarios.