What and When to Learn: CURriculum Ranking Loss for Large-Scale Speaker Verification

๐Ÿ“… 2026-03-25
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the vulnerability of conventional fixed-margin losses to noisy or mislabeled samples in large-scale speaker verification, which compromises embedding space compactness. To enhance robustness, the authors propose Curry (CURriculum Ranking), an adaptive loss function that, for the first time, estimates sample difficulty online based on confidence scores from Sub-center ArcFaceโ€”without requiring additional annotations. Difficulty levels are dynamically categorized into easy, medium, and hard tiers using batch-wise statistics, enabling progressive curriculum learning with adaptive weighting. The method significantly improves model robustness, reducing equal error rates (EER) by 86.8% on VoxCeleb1-O and 60.0% on SITW compared to the Sub-center ArcFace baseline, thereby establishing the largest and most efficient speaker verification system to date.

Technology Category

Application Category

๐Ÿ“ Abstract
Speaker verification at large scale remains an open challenge as fixed-margin losses treat all samples equally regardless of quality. We hypothesize that mislabeled or degraded samples introduce noisy gradients that disrupt compact speaker manifolds. We propose Curry (CURriculum Ranking), an adaptive loss that estimates sample difficulty online via Sub-center ArcFace: confidence scores from dominant sub-center cosine similarity rank samples into easy, medium, and hard tiers using running batch statistics, without auxiliary annotations. Learnable weights guide the model from stable identity foundations through manifold refinement to boundary sharpening. To our knowledge, this is the largest-scale speaker verification system trained to date. Evaluated on VoxCeleb1-O, and SITW, Curry reduces EER by 86.8\% and 60.0\% over the Sub-center ArcFace baseline, establishing a new paradigm for robust speaker verification on imperfect large-scale data.
Problem

Research questions and friction points this paper is trying to address.

speaker verification
large-scale
noisy labels
sample quality
fixed-margin loss
Innovation

Methods, ideas, or system contributions that make the work stand out.

Curriculum Learning
Adaptive Loss
Speaker Verification
Sub-center ArcFace
Large-scale Learning
๐Ÿ”Ž Similar Papers
No similar papers found.