🤖 AI Summary
To address domain distribution mismatch between training and test utterances caused by channel variability in speaker verification, this paper proposes JPOT-PL, an unsupervised domain adaptation method. JPOT-PL is the first to jointly model the geometry-aware distance of Partial Optimal Transport (POT) with soft pseudo-speaker labels, enabling synergistic optimization of distribution alignment and discriminative learning—overcoming the limitation of conventional methods that neglect intrinsic data structural properties. By jointly optimizing a POT-based alignment loss and a pseudo-label-driven classification loss, JPOT-PL achieves substantial improvements: after fine-tuning on VoxCeleb, it reduces the Equal Error Rate (EER) by over 10% on SV channel adaptation benchmarks compared to state-of-the-art approaches, markedly enhancing verification robustness under cross-channel conditions.
📝 Abstract
Domain gap often degrades the performance of speaker verification (SV) systems when the statistical distributions of training data and real-world test speech are mismatched. Channel variation, a primary factor causing this gap, is less addressed than other issues (e.g., noise). Although various domain adaptation algorithms could be applied to handle this domain gap problem, most algorithms could not take the complex distribution structure in domain alignment with discriminative learning. In this paper, we propose a novel unsupervised domain adaptation method, i.e., Joint Partial Optimal Transport with Pseudo Label (JPOT-PL), to alleviate the channel mismatch problem. Leveraging the geometric-aware distance metric of optimal transport in distribution alignment, we further design a pseudo label-based discriminative learning where the pseudo label can be regarded as a new type of soft speaker label derived from the optimal coupling. With the JPOT-PL, we carry out experiments on the SV channel adaptation task with VoxCeleb as the basis corpus. Experiments show our method reduces EER by over 10% compared with several state-of-the-art channel adaptation algorithms.