LJ-Spoof: A Generatively Varied Corpus for Audio Anti-Spoofing and Synthesis Source Tracing

📅 2026-01-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of current audio anti-spoofing research, which suffers from a lack of high-quality, speaker-specific data with systematic variations, hindering robust spoof detection and fine-grained attribution of synthetic origins. To bridge this gap, we present the first large-scale, high-variability speaker-specific corpus, encompassing 30 text-to-speech (TTS) model families, 500 generative variant subsets, and over three million utterances. The dataset systematically manipulates key factors—including prosody, vocoders, generation hyperparameters, source real speech, training strategies, and neural post-processing—enabling controlled analysis of their impact on spoofing artifacts. Designed to serve both as a training resource and an evaluation benchmark, the corpus significantly enhances anti-spoofing performance and source attribution accuracy under speaker-conditioned settings, thereby filling a critical void in high-quality data for the field.

Technology Category

Application Category

📝 Abstract
Speaker-specific anti-spoofing and synthesis-source tracing are central challenges in audio anti-spoofing. Progress has been hampered by the lack of datasets that systematically vary model architectures, synthesis pipelines, and generative parameters. To address this gap, we introduce LJ-Spoof, a speaker-specific, generatively diverse corpus that systematically varies prosody, vocoders, generative hyperparameters, bona fide prompt sources, training regimes, and neural post-processing. The corpus spans one speakers-including studio-quality recordings-30 TTS families, 500 generatively variant subsets, 10 bona fide neural-processing variants, and more than 3 million utterances. This variation-dense design enables robust speaker-conditioned anti-spoofing and fine-grained synthesis-source tracing. We further position this dataset as both a practical reference training resource and a benchmark evaluation suite for anti-spoofing and source tracing.
Problem

Research questions and friction points this paper is trying to address.

audio anti-spoofing
speaker-specific
synthesis-source tracing
generative variation
dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

generative diversity
speaker-specific anti-spoofing
synthesis-source tracing
systematic variation
TTS corpus
🔎 Similar Papers
No similar papers found.