Private kNN-VC: Interpretable Anonymization of Converted Speech

๐Ÿ“… 2025-05-23
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses speaker identity leakage in voice anonymization caused by prosodic features, revealing that phoneme-level duration and prosodic variation constitute critical discriminative cues for black-box speaker recognition models. We propose the first interpretable prosody anonymization method tailored for kNN-VC architectures, integrating differentiable phoneme duration modeling, prosodic variation masking, and an adversarial target-speech selection strategy. Experimental results demonstrate that phoneme-level prosodic features indeed serve as primary carriers of speaker identity. Our method achieves over a 40% reduction in equal error rate (EER) under black-box attacks while preserving speech naturalness and linguistic intelligibility, thereby significantly enhancing the privacyโ€“utility trade-off.

Technology Category

Application Category

๐Ÿ“ Abstract
Speaker anonymization seeks to conceal a speaker's identity while preserving the utility of their speech. The achieved privacy is commonly evaluated with a speaker recognition model trained on anonymized speech. Although this represents a strong attack, it is unclear which aspects of speech are exploited to identify the speakers. Our research sets out to unveil these aspects. It starts with kNN-VC, a powerful voice conversion model that performs poorly as an anonymization system, presumably because of prosody leakage. To test this hypothesis, we extend kNN-VC with two interpretable components that anonymize the duration and variation of phones. These components increase privacy significantly, proving that the studied prosodic factors encode speaker identity and are exploited by the privacy attack. Additionally, we show that changes in the target selection algorithm considerably influence the outcome of the privacy attack.
Problem

Research questions and friction points this paper is trying to address.

Identify prosodic factors leaking speaker identity in anonymized speech
Enhance kNN-VC with interpretable components to anonymize phone duration and variation
Evaluate impact of target selection algorithms on privacy attack outcomes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends kNN-VC with interpretable anonymization components
Anonymizes phone duration and variation for privacy
Modifies target selection to enhance privacy outcomes
๐Ÿ”Ž Similar Papers
No similar papers found.
Carlos Franzreb
Carlos Franzreb
German Research Center for Artificial Intelligence
Deep learningspeech processingprivacy
A
Arnab Das
German Research Center for Artificial Intelligence, Germany
Tim Polzehl
Tim Polzehl
German Research Center for Artificial Intelligence
Speech and Language technology
S
Sebastian Moller
Technical University of Berlin, Germany