Multiverse Through Deepfakes: The MultiFakeVerse Dataset of Person-Centric Visual and Conceptual Manipulations

📅 2025-06-01

📈 Citations: 0

✨ Influential: 0

career value

244K/year

🤖 AI Summary

Existing deepfake benchmarks primarily focus on identity swapping or localized editing, lacking large-scale datasets that support human-centered high-level semantic manipulation—such as actions, scenes, and human-object interactions—with explicit reasoning capabilities. To address this gap, we propose MultiFakeVerse, the first large-scale deepfake benchmark dedicated to human-centered semantic manipulation, comprising 845,000 images generated via vision-language models (VLMs). Our approach introduces a novel VLM-instruction-guided, semantics-driven generation paradigm that enables context-aware manipulation grounded in narrative intent and perceptual importance, integrating controllable image synthesis with multimodal semantic alignment. Extensive experiments demonstrate that state-of-the-art detectors and human observers struggle to identify these concept-level manipulations, confirming their strong imperceptibility. The dataset and code are publicly released.

Technology Category

Application Category

📝 Abstract

The rapid advancement of GenAI technology over the past few years has significantly contributed towards highly realistic deepfake content generation. Despite ongoing efforts, the research community still lacks a large-scale and reasoning capability driven deepfake benchmark dataset specifically tailored for person-centric object, context and scene manipulations. In this paper, we address this gap by introducing MultiFakeVerse, a large scale person-centric deepfake dataset, comprising 845,286 images generated through manipulation suggestions and image manipulations both derived from vision-language models (VLM). The VLM instructions were specifically targeted towards modifications to individuals or contextual elements of a scene that influence human perception of importance, intent, or narrative. This VLM-driven approach enables semantic, context-aware alterations such as modifying actions, scenes, and human-object interactions rather than synthetic or low-level identity swaps and region-specific edits that are common in existing datasets. Our experiments reveal that current state-of-the-art deepfake detection models and human observers struggle to detect these subtle yet meaningful manipulations. The code and dataset are available on href{https://github.com/Parul-Gupta/MultiFakeVerse}{GitHub}.

Problem

Research questions and friction points this paper is trying to address.

Lack of large-scale deepfake dataset for person-centric manipulations

Need for reasoning-driven benchmark with contextual and semantic alterations

Current detection models fail to identify subtle VLM-generated manipulations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale person-centric deepfake dataset

VLM-driven semantic context-aware alterations

Modifications targeting human perception factors

🔎 Similar Papers

No similar papers found.