MAVOS-DD: Multilingual Audio-Video Open-Set Deepfake Detection Benchmark

📅 2025-05-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the poor generalization of multilingual audio-visual deepfake detection under open-set conditions. To this end, we introduce ML-OpenDF—the first large-scale, multilingual open-set benchmark for deepfake detection—comprising 250+ hours of real and synthetic videos across eight languages and seven state-of-the-art generation models. ML-OpenDF pioneers the “multilingual + open-set” joint detection paradigm and systematically designs disjoint language–model combinations for training and testing to better reflect real-world deployment challenges. Extensive evaluation on ML-OpenDF reveals that current SOTA detectors suffer significant performance degradation under open-set settings. All data, annotations, and baseline code are publicly released, establishing a critical infrastructure and standardized evaluation protocol for robust deepfake detection research.

Technology Category

Application Category

📝 Abstract
We present the first large-scale open-set benchmark for multilingual audio-video deepfake detection. Our dataset comprises over 250 hours of real and fake videos across eight languages, with 60% of data being generated. For each language, the fake videos are generated with seven distinct deepfake generation models, selected based on the quality of the generated content. We organize the training, validation and test splits such that only a subset of the chosen generative models and languages are available during training, thus creating several challenging open-set evaluation setups. We perform experiments with various pre-trained and fine-tuned deepfake detectors proposed in recent literature. Our results show that state-of-the-art detectors are not currently able to maintain their performance levels when tested in our open-set scenarios. We publicly release our data and code at: https://huggingface.co/datasets/unibuc-cs/MAVOS-DD.
Problem

Research questions and friction points this paper is trying to address.

First large-scale multilingual audio-video deepfake detection benchmark
Evaluates detectors in challenging open-set scenarios across languages
Assesses performance drop of state-of-the-art detectors in unseen conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual audio-video open-set benchmark
Seven distinct deepfake generation models
Open-set evaluation with unseen models
🔎 Similar Papers