🤖 AI Summary
Existing audio unlearning methods struggle with the temporal dynamics and high dimensionality of speech, failing to completely erase individual voiceprints from biometric models—thus falling short of GDPR and other privacy regulations. This paper proposes the first quantum-inspired audio unlearning framework, integrating quantum principles—destructive interference for weight initialization, superposition-based label transformation, uncertainty-maximizing loss, and entanglement-driven weight mixing—into speaker identity removal. Implemented on architectures including ResNet-18 and Vision Transformers (ViT), the method achieves precise, targeted unlearning. Experiments demonstrate 0% re-identification accuracy for erased speakers while incurring only a 0.05% performance drop on retained data—substantially outperforming conventional approaches. The framework thus delivers strong privacy guarantees without compromising model utility, bridging a critical gap between regulatory compliance and practical deployment in speaker recognition systems.
📝 Abstract
The widespread adoption of voice-enabled authentication and audio biometric systems have significantly increased privacy vulnerabilities associated with sensitive speech data. Compliance with privacy regulations such as GDPR's right to be forgotten and India's DPDP Act necessitates targeted and efficient erasure of individual-specific voice signatures from already-trained biometric models. Existing unlearning methods designed for visual data inadequately handle the sequential, temporal, and high-dimensional nature of audio signals, leading to ineffective or incomplete speaker and accent erasure. To address this, we introduce QPAudioEraser, a quantum-inspired audio unlearning framework. Our our-phase approach involves: (1) weight initialization using destructive interference to nullify target features, (2) superposition-based label transformations that obscure class identity, (3) an uncertainty-maximizing quantum loss function, and (4) entanglement-inspired mixing of correlated weights to retain model knowledge. Comprehensive evaluations with ResNet18, ViT, and CNN architectures across AudioMNIST, Speech Commands, LibriSpeech, and Speech Accent Archive datasets validate QPAudioEraser's superior performance. The framework achieves complete erasure of target data (0% Forget Accuracy) while incurring minimal impact on model utility, with a performance degradation on retained data as low as 0.05%. QPAudioEraser consistently surpasses conventional baselines across single-class, multi-class, sequential, and accent-level erasure scenarios, establishing the proposed approach as a robust privacy-preserving solution.