Assessing the Impact of Speaker Identity in Speech Spoofing Detection

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the unclear impact of speaker identity on embedding representations in existing voice anti-spoofing systems. To this end, it presents the first systematic evaluation and effective disentanglement of speaker-related factors from anti-spoofing performance. The work proposes two contrasting speaker-invariant modeling strategies: a joint multi-task learning approach and an explicit removal of speaker information via a gradient reversal layer. Evaluated across four benchmark datasets, the proposed methods demonstrate substantial improvements, reducing the average equal error rate by 17% and achieving up to a 48% reduction for the most challenging attack types (e.g., A11). These results significantly enhance the model’s generalization capability and robustness against diverse spoofing attacks.

Technology Category

Application Category

📝 Abstract
Spoofing detection systems are typically trained using diverse recordings from multiple speakers, often assuming that the resulting embeddings are independent of speaker identity. However, this assumption remains unverified. In this paper, we investigate the impact of speaker information on spoofing detection systems. We propose two approaches within our Speaker-Invariant Multi-Task framework, one that models speaker identity within the embeddings and another that removes it. SInMT integrates multi-task learning for joint speaker recognition and spoofing detection, incorporating a gradient reversal layer. Evaluated using four datasets, our speaker-invariant model reduces the average equal error rate by 17% compared to the baseline, with up to 48% reduction for the most challenging attacks (e.g., A11).
Problem

Research questions and friction points this paper is trying to address.

speaker identity
speech spoofing detection
speaker-invariant
embeddings
multi-task learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

speaker-invariant
multi-task learning
gradient reversal layer
speech spoofing detection
speaker identity
🔎 Similar Papers
No similar papers found.
A
Anh-Tuan Dao
Laboratoire d’informatique d’Avignon, France
D
Driss Matrouf
Laboratoire d’informatique d’Avignon, France
Nicholas Evans
Nicholas Evans
Professor, Audio Security and Privacy, EURECOM, France
speaker recognitionanti-spoofingpresentation attack detectionprivacy preservationpseudonymisation