LlamaPartialSpoof: An LLM-Driven Fake Speech Dataset Simulating Disinformation Generation

📅 2024-09-23
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing fake speech detection systems exhibit severely limited generalization against unseen TTS models and splicing attacks—particularly those leveraging LLM-driven generation combined with multi-source voice cloning—yielding a high minimum equal-error rate (EER) of 24.49%. Method: We construct the first robustness-evaluation dataset comprising 130 hours of both fully synthetic and partially forged speech, explicitly designed from the attacker’s perspective to mirror real-world disinformation generation pipelines. We introduce the novel “partially forged speech” paradigm, systematically varying TTS architectures and splicing strategies to expose inherent biases in countermeasure (CM) systems. Contribution/Results: This work establishes a new benchmark for cross-domain robustness evaluation, enables co-modeling of attack and defense mechanisms, and provides methodological foundations for improving generalization of detection systems under realistic adversarial conditions.

Technology Category

Application Category

📝 Abstract
Previous fake speech datasets were constructed from a defender's perspective to develop countermeasure (CM) systems without considering diverse motivations of attackers. To better align with real-life scenarios, we created LlamaPartialSpoof, a 130-hour dataset that contains both fully and partially fake speech, using a large language model (LLM) and voice cloning technologies to evaluate the robustness of CMs. By examining valuable information for both attackers and defenders, we identify several key vulnerabilities in current CM systems, which can be exploited to enhance attack success rates, including biases toward certain text-to-speech models or concatenation methods. Our experimental results indicate that the current fake speech detection system struggle to generalize to unseen scenarios, achieving a best performance of 24.49% equal error rate.
Problem

Research questions and friction points this paper is trying to address.

Voice Forgery
Speech Synthesis
Deep Learning Security
Innovation

Methods, ideas, or system contributions that make the work stand out.

LlamaPartialSpoof
PartiallyForgedVoices
VoiceSecurityChallenges
🔎 Similar Papers
No similar papers found.
H
Hieu-Thi Luong
Nanyang Technological University, Singapore
H
Haoyang Li
Nanyang Technological University, Singapore
L
Lin Zhang
Brno University of Technology, Czech Republic
Kong Aik Lee
Kong Aik Lee
The Hong Kong Polytechnic University, Hong Kong
Speaker and Spoken Language RecognitionSpeech ProcessingDigital Signal ProcessingSubband
C
Chng Eng Siong
Nanyang Technological University, Singapore