🤖 AI Summary
Speech emotion recognition (SER) faces challenges in distinguishing subtle emotional differences and resolving high inter-class overlap. To address these, this paper proposes a quantum-classical hybrid model integrating convolutional neural networks (CNNs) with parameterized quantum circuits (PQCs), leveraging quantum superposition and entanglement to enhance discriminative representation learning from speech features. We provide the first empirical evidence that PQCs not only reduce model parameter count by over 30% but also significantly improve SER accuracy. Moreover, we pioneer the application of quantum representation learning to multi-emotion-state modeling. The model is rigorously evaluated across three benchmark datasets—IEMOCAP, RECOLA, and MSP-Improv—demonstrating consistent superiority over purely classical baselines in both binary and multi-class SER tasks. These results validate the effectiveness and generalizability of quantum enhancement for improving SER performance.
📝 Abstract
Speech Emotion Recognition (SER) is a complex and challenging task in human-computer interaction due to the intricate dependencies of features and the overlapping nature of emotional expressions conveyed through speech. Although traditional deep learning methods have shown effectiveness, they often struggle to capture subtle emotional variations and overlapping states. This paper introduces a hybrid classical-quantum framework that integrates Parameterised Quantum Circuits (PQCs) with conventional Convolutional Neural Network (CNN) architectures. By leveraging quantum properties such as superposition and entanglement, the proposed model enhances feature representation and captures complex dependencies more effectively than classical methods. Experimental evaluations conducted on benchmark datasets, including IEMOCAP, RECOLA, and MSP-Improv, demonstrate that the hybrid model achieves higher accuracy in both binary and multi-class emotion classification while significantly reducing the number of trainable parameters. While a few existing studies have explored the feasibility of using Quantum Circuits to reduce model complexity, none have successfully shown how they can enhance accuracy. This study is the first to demonstrate that Quantum Circuits has the potential to improve the accuracy of SER. The findings highlight the promise of QML to transform SER, suggesting a promising direction for future research and practical applications in emotion-aware systems.