DiffAnon: Diffusion-based Prosody Control for Voice Anonymization

📅 2026-04-29

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the inherent trade-off between prosody preservation and speaker privacy in voice anonymization, a balance that existing methods fail to control flexibly. We propose the first framework enabling structured, continuous adjustment of both prosody fidelity and anonymization strength during inference. Built upon a diffusion model operating in the semantic embedding space of a residual vector quantized (RVQ) codec, our approach leverages classifier-free guidance (CFG) to achieve explicit and interpolable prosody control. Experimental results demonstrate that a single model instance can simultaneously attain high speech utility and strong privacy protection across multiple operating points, offering a clear and controllable utility-privacy trade-off.

📝 Abstract

To preserve or not to preserve prosody is a central question in voice anonymization. Prosody conveys meaning and affect, yet is tightly coupled with speaker identity. Existing methods either discard prosody for privacy or lack a principled mechanism to control the utility-privacy trade-off, operating at fixed design points. We propose DiffAnon, a diffusion-based anonymization method with classifier-free guidance (CFG) that provides explicit, continuous inference-time control over prosody preservation. DiffAnon refines acoustic detail over semantic embeddings of an RVQ codec, enabling smooth interpolation between anonymization strength and prosodic fidelity within a single model. To the best of our knowledge, it is the first voice anonymization framework to provide structured, interpolatable inference-time prosody control. Experiments demonstrate structured trade-off behavior, achieving strong utility while maintaining competitive privacy across controllable operating points.

Problem

Research questions and friction points this paper is trying to address.

voice anonymization

prosody preservation

privacy-utility trade-off

speaker identity

prosody control

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion-based anonymization

prosody control

classifier-free guidance