Pushing the Frontiers of Self-Distillation Prototypes Network with Dimension Regularization and Score Normalization

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This paper addresses unsupervised speaker verification by proposing a robust self-supervised framework designed to narrow the performance gap between self-supervised and fully supervised methods. The approach builds upon a non-contrastive self-distillation prototype network (SDPN), integrating two key innovations: (i) dimension-wise regularization to mitigate embedding space collapse, and (ii) EER-based score normalization—a supervised scoring calibration technique—to enhance discriminability. To our knowledge, this is the first work to jointly incorporate these mechanisms into SDPN for speaker verification. Evaluated on the VoxCeleb1-O, -E, and -H test sets, the method achieves EERs of 1.29%, 1.60%, and 2.80%, respectively—representing relative improvements of 28.3%, 19.6%, and 22.6% over the prior state-of-the-art self-supervised approaches. These results establish a new self-supervised SOTA in speaker verification.

Technology Category

Application Category

📝 Abstract

Developing robust speaker verification (SV) systems without speaker labels has been a longstanding challenge. Earlier research has highlighted a considerable performance gap between self-supervised and fully supervised approaches. In this paper, we enhance the non-contrastive self-supervised framework, Self-Distillation Prototypes Network (SDPN), by introducing dimension regularization that explicitly addresses the collapse problem through the application of regularization terms to speaker embeddings. Moreover, we integrate score normalization techniques from fully supervised SV to further bridge the gap toward supervised verification performance. SDPN with dimension regularization and score normalization sets a new state-of-the-art on the VoxCeleb1 speaker verification evaluation benchmark, achieving Equal Error Rate 1.29%, 1.60%, and 2.80% for trial VoxCeleb1-{O,E,H} respectively. These results demonstrate relative improvements of 28.3%, 19.6%, and 22.6% over the current best self-supervised methods, thereby advancing the frontiers of SV technology.

Problem

Research questions and friction points this paper is trying to address.

Enhancing self-supervised speaker verification without labels

Addressing embedding collapse via dimension regularization

Bridging performance gap with supervised score normalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

Enhances SDPN with dimension regularization

Applies score normalization techniques

Achieves state-of-the-art speaker verification

🔎 Similar Papers

No similar papers found.