Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization

📅 2025-05-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses unsupervised speaker verification by proposing a robust self-supervised framework designed to narrow the performance gap between self-supervised and fully supervised methods. The approach builds upon a non-contrastive self-distillation prototype network (SDPN), integrating two key innovations: (i) dimension-wise regularization to mitigate embedding space collapse, and (ii) EER-based score normalization—a supervised scoring calibration technique—to enhance discriminability. To our knowledge, this is the first work to jointly incorporate these mechanisms into SDPN for speaker verification. Evaluated on the VoxCeleb1-O, -E, and -H test sets, the method achieves EERs of 1.29%, 1.60%, and 2.80%, respectively—representing relative improvements of 28.3%, 19.6%, and 22.6% over the prior state-of-the-art self-supervised approaches. These results establish a new self-supervised SOTA in speaker verification.

Technology Category

Application Category

📝 Abstract
Developing robust speaker verification (SV) systems without speaker labels has been a longstanding challenge. Earlier research has highlighted a considerable performance gap between self-supervised and fully supervised approaches. In this paper, we enhance the non-contrastive self-supervised framework, Self-Distillation Prototypes Network (SDPN), by introducing dimension regularization that explicitly addresses the collapse problem through the application of regularization terms to speaker embeddings. Moreover, we integrate score normalization techniques from fully supervised SV to further bridge the gap toward supervised verification performance. SDPN with dimension regularization and score normalization sets a new state-of-the-art on the VoxCeleb1 speaker verification evaluation benchmark, achieving Equal Error Rate 1.29%, 1.60%, and 2.80% for trial VoxCeleb1-{O,E,H} respectively. These results demonstrate relative improvements of 28.3%, 19.6%, and 22.6% over the current best self-supervised methods, thereby advancing the frontiers of SV technology.
Problem

Research questions and friction points this paper is trying to address.

Enhancing self-supervised speaker verification without labels
Addressing embedding collapse via dimension regularization
Bridging performance gap with supervised score normalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances SDPN with dimension regularization
Applies score normalization techniques
Achieves state-of-the-art speaker verification
🔎 Similar Papers
No similar papers found.