π€ AI Summary
This study introduces machine unlearning to paralinguistic speech processing (PSP) for the first time, addressing significant performance degradation in speech emotion recognition (SER) and depression detection (DD) after user data removal. We propose SISA++, an enhanced unlearning framework that integrates sharded training, weight averaging, and multi-scale speech representations (wav2vec 2.0 and OpenSMILE). Furthermore, we establish the first machine unlearning βrecipeβ tailored to PSP, offering reusable, empirically grounded guidelines for feature and architecture selection. Evaluated on CREMA-D and E-DAIC, SISA++ achieves post-unlearning accuracy improvements of 4.2% for SER and 3.8% for DD over the baseline SISA, substantially mitigating performance decay. These results demonstrate that both the proposed method and the domain-specific unlearning recipe deliver tangible gains in robust, efficient unlearning for PSP tasks.
π Abstract
In this work, we pioneer the study of Machine Unlearning (MU) for Paralinguistic Speech Processing (PSP). We focus on two key PSP tasks: Speech Emotion Recognition (SER) and Depression Detection (DD). To this end, we propose, SISA++, a novel extension to previous state-of-the-art (SOTA) MU method, SISA by merging models trained on different shards with weight-averaging. With such modifications, we show that SISA++ preserves performance more in comparison to SISA after unlearning in benchmark SER (CREMA-D) and DD (E-DAIC) datasets. Also, to guide future research for easier adoption of MU for PSP, we present ``cookbook recipes'' - actionable recommendations for selecting optimal feature representations and downstream architectures that can mitigate performance degradation after the unlearning process.