🤖 AI Summary
This work addresses the challenge of voice anonymization by preserving linguistic content and emotional expression while concealing speaker identity. It introduces the first systematic evaluation framework that explicitly incorporates emotional fidelity as a core assessment dimension, establishing a multi-objective optimization paradigm that jointly optimizes privacy protection, semantic preservation, and emotional consistency. By integrating techniques such as speaker embedding perturbation, voice conversion, and generative modeling, and by introducing objective metrics based on adversarial attack models, the framework enables comprehensive evaluation of various baseline and submitted anonymization systems. Experimental results demonstrate that the proposed approach effectively balances privacy guarantees with speech utility, offering a new benchmark and guiding direction for future research in voice privacy.
📝 Abstract
We present results and analyses from the third VoicePrivacy Challenge held in 2024, which focuses on advancing voice anonymization technologies. The task was to develop a voice anonymization system for speech data that conceals a speaker's voice identity while preserving linguistic content and emotional state. We provide a systematic overview of the challenge framework, including detailed descriptions of the anonymization task and datasets used for both system development and evaluation. We outline the attack model and objective evaluation metrics for assessing privacy protection (concealing speaker voice identity) and utility (content and emotional state preservation). We describe six baseline anonymization systems and summarize the innovative approaches developed by challenge participants. Finally, we provide key insights and observations to guide the design of future VoicePrivacy challenges and identify promising directions for voice anonymization research.