Continual Speaker Identity Unlearning with Minimal Interference

📅 2026-05-25

📈 Citations: 0

✨ Influential: 0

career value

198K/year

🤖 AI Summary

This work addresses the critical challenge of catastrophic unlearning in continual speaker identity removal for zero-shot text-to-speech systems, where existing methods risk re-exposing previously forgotten identities upon processing new forgetting requests. To mitigate this privacy vulnerability, we propose the first continual speaker unlearning framework that operates without access to historical unlearned data. Our approach integrates Fisher information–based parameter masking with orthogonal projection onto the subspace of prior unlearning updates, effectively preventing interference between successive forgetting tasks. Experiments on the VoiceBox model demonstrate that the proposed framework not only accurately removes newly requested speakers but also robustly preserves the unlearned status of previously forgotten identities, substantially outperforming sequential applications of current unlearning techniques.

📝 Abstract

Machine unlearning removes designated concepts or knowledge from pre-trained models. Recent work has extended this paradigm to speaker identity unlearning in zero-shot text-to-speech (ZS-TTS), the task of selectively erasing a model's ability to replicate a speaker's voice. Existing methods, however, quietly assume all unlearning requests arrive at once; an unrealistic assumption, since privacy-motivated removals arrive sequentially over time. We show this assumption breaks state-of-the-art methods: unlearning each new speaker fully revives previously unlearned speakers, reintroducing the very privacy risk unlearning was meant to eliminate. We present Cumulative ORThogonal Identity Suppression (CORTIS), the first framework for continual speaker identity unlearning in ZS-TTS that requires no access to previously-unlearned speaker data. CORTIS combines Fisher-information-based parameter masking, which localizes updates to speaker-relevant weights, with orthogonal projection against subspaces spanned by prior unlearning updates. With VoiceBox, CORTIS unlearns each requested speaker while keeping previously unlearned speakers forgotten across long request sequences, substantially outperforming sequential application of prior methods. The demo is available at https://cumulativeortis.github.io/ .

Problem

Research questions and friction points this paper is trying to address.

continual unlearning

speaker identity unlearning

zero-shot text-to-speech

privacy preservation

machine unlearning

Innovation

Methods, ideas, or system contributions that make the work stand out.

continual unlearning

speaker identity unlearning

zero-shot TTS