🤖 AI Summary
To address the challenge of detecting subtle artifacts in generative voice spoofing, this paper proposes a model-agnostic, three-stage active amplification pipeline: (1) injecting controlled noise into input speech; (2) leveraging off-the-shelf speech enhancement models (e.g., DCCRN, SEGAN) to disentangle and isolate noise from spoofing artifacts; and (3) selectively amplifying spoofing traces in the feature domain. This work pioneers the repurposing of speech enhancement—from passive noise suppression to active artifact enhancement—without modifying downstream detector architectures and maintaining compatibility with diverse anti-spoofing frameworks. Evaluated on ASVspoof2019 and ASVspoof2021, the method achieves up to 44.44% and 26.34% relative reduction in Equal Error Rate (EER), respectively, and significantly improves cross-generation-model generalization performance.
📝 Abstract
Spoofed utterances always contain artifacts introduced by generative models. While several countermeasures have been proposed to detect spoofed utterances, most primarily focus on architectural improvements. In this work, we investigate how artifacts remain hidden in spoofed speech and how to enhance their presence. We propose a model-agnostic pipeline that amplifies artifacts using speech enhancement and various types of noise. Our approach consists of three key steps: noise addition, noise extraction, and noise amplification. First, we introduce noise into the raw speech. Then, we apply speech enhancement to extract the entangled noise and artifacts. Finally, we amplify these extracted features. Moreover, our pipeline is compatible with different speech enhancement models and countermeasure architectures. Our method improves spoof detection performance by up to 44.44% on ASVspoof2019 and 26.34% on ASVspoof2021.