🤖 AI Summary
This work addresses the threat posed by generative speech spoofing to speaker verification systems by proposing an anti-spoofing-aware joint discrimination framework. The approach integrates a wavelet prompt-tuned XLSR-AASIST anti-spoofing module with a multi-model ensemble speaker verification system comprising ResNet34, ResNet293, and WavLM-ECAPA-TDNN, optimized end-to-end via Z-score normalization and score averaging. A novel wavelet prompt tuning mechanism is introduced to significantly enhance robustness against spoofed utterances. Evaluated in the WildSpoof 2026 Challenge, the system achieves a Macro a-DCF of 0.2017, an SASV EER of 2.08%, and an impressively low within-domain spoof detection EER of 0.16%, demonstrating its effectiveness while highlighting cross-domain generalization as a persistent challenge.
📝 Abstract
This paper describes the UZH-CL system submitted to the SASV section of the WildSpoof 2026 challenge. The challenge focuses on the integrated defense against generative spoofing attacks by requiring the simultaneous verification of speaker identity and audio authenticity. We proposed a cascaded Spoofing-Aware Speaker Verification framework that integrates a Wavelet Prompt-Tuned XLSR-AASIST countermeasure with a multi-model ensemble. The ASV component utilizes the ResNet34, ResNet293, and WavLM-ECAPA-TDNN architectures, with Z-score normalization followed by score averaging. Trained on VoxCeleb2 and SpoofCeleb, the system obtained a Macro a-DCF of 0.2017 and a SASV EER of 2.08%. While the system achieved a 0.16% EER in spoof detection on the in-domain data, results on unseen datasets, such as the ASVspoof5, highlight the critical challenge of cross-domain generalization.