Perturbed Public Voices (P$^{2}$V): A Dataset for Robust Audio Deepfake Detection

📅 2025-08-13

📈 Citations: 0

✨ Influential: 0

career value

255K/year

🤖 AI Summary

Existing audio deepfake detectors perform well on controlled benchmarks but exhibit severe robustness deficiencies in real-world scenarios. Method: We introduce P²V, the first practical-threat-oriented, high-fidelity detection benchmark—uniquely integrating LLM-generated identity-consistent utterances, realistic environmental noise, adversarial perturbations, and state-of-the-art voice cloning techniques deployed between 2020–2025, to construct a comprehensive forgery dataset spanning multidimensional challenges. Contribution/Results: Experiments show that current SOTA detectors suffer an average 43% performance drop on P²V, revealing critical generalization gaps. In contrast, models trained on P²V achieve significantly enhanced robustness against complex attacks while preserving strong generalization on prior benchmarks, demonstrating P²V’s pivotal role in advancing practically deployable deepfake detection.

Technology Category

Application Category

📝 Abstract

Current audio deepfake detectors cannot be trusted. While they excel on controlled benchmarks, they fail when tested in the real world. We introduce Perturbed Public Voices (P$^{2}$V), an IRB-approved dataset capturing three critical aspects of malicious deepfakes: (1) identity-consistent transcripts via LLMs, (2) environmental and adversarial noise, and (3) state-of-the-art voice cloning (2020-2025). Experiments reveal alarming vulnerabilities of 22 recent audio deepfake detectors: models trained on current datasets lose 43% performance when tested on P$^{2}$V, with performance measured as the mean of F1 score on deepfake audio, AUC, and 1-EER. Simple adversarial perturbations induce up to 16% performance degradation, while advanced cloning techniques reduce detectability by 20-30%. In contrast, P$^{2}$V-trained models maintain robustness against these attacks while generalizing to existing datasets, establishing a new benchmark for robust audio deepfake detection. P$^{2}$V will be publicly released upon acceptance by a conference/journal.

Problem

Research questions and friction points this paper is trying to address.

Current audio deepfake detectors fail in real-world scenarios

Existing datasets lack robustness against noise and adversarial attacks

Advanced cloning techniques significantly reduce detection accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated identity-consistent transcripts

Environmental and adversarial noise integration

State-of-the-art voice cloning techniques

🔎 Similar Papers

Audio Anti-Spoofing Detection: A Survey

2024-04-22arXiv.orgCitations: 25

A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection

2024-09-23arXiv.orgCitations: 1