🤖 AI Summary
Existing deepfake detection methods exhibit severe vulnerability to localized spatiotemporal manipulations—such as micro-expression editing or object replacement—in “partially forged videos” (FakeParts). This work first systematically defines the FakeParts threat model, introduces a fine-grained generative adversarial and editing framework to synthesize highly realistic localized forgeries, and constructs FakePartsBench—the first large-scale benchmark dataset comprising over 25,000 videos with pixel-level and frame-level manipulation annotations. Experiments reveal a >30% drop in human detection accuracy and significant performance degradation across state-of-the-art detection models. By establishing a rigorous evaluation protocol and providing high-quality, fine-grained ground truth, this work fills a critical gap in partial forgery detection research and lays a foundational resource for advancing granular deepfake analysis and robust detection.
📝 Abstract
We introduce FakeParts, a new class of deepfakes characterized by subtle, localized manipulations to specific spatial regions or temporal segments of otherwise authentic videos. Unlike fully synthetic content, these partial manipulations, ranging from altered facial expressions to object substitutions and background modifications, blend seamlessly with real elements, making them particularly deceptive and difficult to detect. To address the critical gap in detection capabilities, we present FakePartsBench, the first large-scale benchmark dataset specifically designed to capture the full spectrum of partial deepfakes. Comprising over 25K videos with pixel-level and frame-level manipulation annotations, our dataset enables comprehensive evaluation of detection methods. Our user studies demonstrate that FakeParts reduces human detection accuracy by over 30% compared to traditional deepfakes, with similar performance degradation observed in state-of-the-art detection models. This work identifies an urgent vulnerability in current deepfake detection approaches and provides the necessary resources to develop more robust methods for partial video manipulations.