Speech Emotion Recognition with ASR Integration

📅 2026-01-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenges of speech emotion recognition (SER) in real-world, spontaneous, and low-resource settings, where emotional expression complexity and technical limitations hinder performance. It presents the first systematic integration of automatic speech recognition (ASR) into the SER pipeline, enabling deep fusion of acoustic and textual modalities through joint modeling of speech signals and ASR-generated transcripts. By leveraging complementary information from both modalities, the proposed approach significantly improves recognition accuracy and system robustness under low-resource and spontaneous speech conditions. This advancement enhances the scalability and adaptability of SER systems in practical applications, paving the way for more effective deployment in real-life scenarios characterized by limited labeled data and naturalistic speech variability.

Technology Category

Application Category

📝 Abstract
Speech Emotion Recognition (SER) plays a pivotal role in understanding human communication, enabling emotionally intelligent systems, and serving as a fundamental component in the development of Artificial General Intelligence (AGI). However, deploying SER in real-world, spontaneous, and low-resource scenarios remains a significant challenge due to the complexity of emotional expression and the limitations of current speech and language technologies. This thesis investigates the integration of Automatic Speech Recognition (ASR) into SER, with the goal of enhancing the robustness, scalability, and practical applicability of emotion recognition from spoken language.
Problem

Research questions and friction points this paper is trying to address.

Speech Emotion Recognition
Automatic Speech Recognition
low-resource scenarios
emotional expression
real-world deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Speech Emotion Recognition
Automatic Speech Recognition
Emotion Recognition Integration
Low-resource SER
Multimodal Emotion Analysis
🔎 Similar Papers
No similar papers found.