Channel Adaptation for Speaker Verification Using Optimal Transport with Pseudo Label

📅 2024-09-14
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address domain distribution mismatch between training and test utterances caused by channel variability in speaker verification, this paper proposes JPOT-PL, an unsupervised domain adaptation method. JPOT-PL is the first to jointly model the geometry-aware distance of Partial Optimal Transport (POT) with soft pseudo-speaker labels, enabling synergistic optimization of distribution alignment and discriminative learning—overcoming the limitation of conventional methods that neglect intrinsic data structural properties. By jointly optimizing a POT-based alignment loss and a pseudo-label-driven classification loss, JPOT-PL achieves substantial improvements: after fine-tuning on VoxCeleb, it reduces the Equal Error Rate (EER) by over 10% on SV channel adaptation benchmarks compared to state-of-the-art approaches, markedly enhancing verification robustness under cross-channel conditions.

Technology Category

Application Category

📝 Abstract
Domain gap often degrades the performance of speaker verification (SV) systems when the statistical distributions of training data and real-world test speech are mismatched. Channel variation, a primary factor causing this gap, is less addressed than other issues (e.g., noise). Although various domain adaptation algorithms could be applied to handle this domain gap problem, most algorithms could not take the complex distribution structure in domain alignment with discriminative learning. In this paper, we propose a novel unsupervised domain adaptation method, i.e., Joint Partial Optimal Transport with Pseudo Label (JPOT-PL), to alleviate the channel mismatch problem. Leveraging the geometric-aware distance metric of optimal transport in distribution alignment, we further design a pseudo label-based discriminative learning where the pseudo label can be regarded as a new type of soft speaker label derived from the optimal coupling. With the JPOT-PL, we carry out experiments on the SV channel adaptation task with VoxCeleb as the basis corpus. Experiments show our method reduces EER by over 10% compared with several state-of-the-art channel adaptation algorithms.
Problem

Research questions and friction points this paper is trying to address.

Addresses channel-induced domain gap in speaker verification
Proposes unsupervised adaptation using optimal transport and pseudo labels
Reduces verification errors by aligning mismatched training-test distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unsupervised domain adaptation with optimal transport
Pseudo label-based discriminative learning
Joint Partial Optimal Transport with Pseudo Label
🔎 Similar Papers
No similar papers found.