Mitigating Language Mismatch in SSL-Based Speaker Anonymization

📅 2025-07-01

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing speaker anonymization systems (SASs) are predominantly designed for English and exhibit substantial performance degradation on non-English languages such as Japanese and Mandarin, highlighting a critical language mismatch problem. To address this, we propose a multilingual speech anonymization framework grounded in self-supervised learning (SSL): a content encoder is first pre-trained on multilingual speech corpora, then fine-tuned separately for monolingual Japanese and multilingual settings. The anonymization process jointly optimizes privacy preservation—enforcing speaker irrecoverability—and speech intelligibility—maximizing linguistic content fidelity. Experimental results demonstrate that our approach significantly improves intelligibility for Japanese and Mandarin, reducing word error rate (WER) by up to 28%, while maintaining strong speaker anonymity (equal error rate < 1.5%). This work constitutes the first systematic validation of SSL-based fine-tuning for multilingual SASs, establishing its effectiveness and cross-lingual generalizability.

Technology Category

Application Category

📝 Abstract

Speaker anonymization aims to protect speaker identity while preserving content information and the intelligibility of speech. However, most speaker anonymization systems (SASs) are developed and evaluated using only English, resulting in degraded utility for other languages. This paper investigates language mismatch in SASs for Japanese and Mandarin speech. First, we fine-tune a self-supervised learning (SSL)-based content encoder with Japanese speech to verify effective language adaptation. Then, we propose fine-tuning a multilingual SSL model with Japanese speech and evaluating the SAS in Japanese and Mandarin. Downstream experiments show that fine-tuning an English-only SSL model with the target language enhances intelligibility while maintaining privacy and that multilingual SSL further extends SASs' utility across different languages. These findings highlight the importance of language adaptation and multilingual pre-training of SSLs for robust multilingual speaker anonymization.

Problem

Research questions and friction points this paper is trying to address.

Addressing language mismatch in speaker anonymization systems

Improving intelligibility for non-English languages like Japanese and Mandarin

Enhancing multilingual utility via SSL model fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tune SSL content encoder with target language

Use multilingual SSL model for cross-language adaptation

Enhance intelligibility while maintaining speaker privacy

🔎 Similar Papers

No similar papers found.

Apple

Cupertino, United States of America

Machine Learning Engineer, Siri Speech

Apple

Seattle, United States of America

Authors to Follow