Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation

📅 2024-06-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the dual challenges of resource constraints on IoT edge devices and privacy sensitivity of speech emotion data, this paper pioneers the application of data distillation to speech emotion recognition (SER). We propose a lightweight, synthetic, and privacy-preserving speech-level data distillation framework grounded in knowledge distillation principles. Our method integrates speech compression with semantic fidelity preservation, coupled with fixed-initialization training and emotion-feature disentanglement modeling—enabling high-quality distilled dataset generation without accessing original sensitive speech. Evaluated on multiple SER benchmarks, a lightweight model trained solely on 5% distilled data achieves 98.3% of the accuracy attained by a model trained on the full dataset. This yields substantial reductions in memory footprint and computational overhead, simultaneously ensuring high performance, ultra-low resource consumption, and strong privacy protection.

Technology Category

Application Category

📝 Abstract
Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment of SER models. To address these challenges, we propose a data distillation framework to facilitate efficient development of SER models in IoT applications using a synthesised, smaller, and distilled dataset. Our experiments demonstrate that the distilled dataset can be effectively utilised to train SER models with fixed initialisation, achieving performances comparable to those developed using the original full emotional speech dataset.
Problem

Research questions and friction points this paper is trying to address.

Overcoming memory and computational limits in IoT SER models
Addressing privacy risks in emotional speech data usage
Enabling efficient SER training with distilled datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data distillation for efficient SER models
Synthesized smaller dataset for IoT
Fixed initialization achieves original performance
🔎 Similar Papers
No similar papers found.
Y
Yi Chang
GLAM – the Group on Language, Audio, & Music, Imperial College London, United Kingdom
Z
Zhao Ren
Cognitive Systems Lab, University of Bremen, Germany
Z
Zhonghao Zhao
School of Medical Technology, Beijing Institute of Technology, China
Thanh Tam Nguyen
Thanh Tam Nguyen
Lecturer, Griffith University
Social Network MiningStream ProcessingBig DataPrivacy-Preserving MLRecommender Systems
K
Kun Qian
School of Medical Technology, Beijing Institute of Technology, China
Tanja Schultz
Tanja Schultz
Professor of Computer Science, University Bremen
Speech RecognitionBiosignalsSilent SpeechHuman-Machine-InterfacesBrain-Computer Interfaces
B
Björn W. Schuller
GLAM – the Group on Language, Audio, & Music, Imperial College London, United Kingdom; Chair of Health Informatics, Klinikum rechts der Isar (MRI), Technical University of Munich (TUM), Germany