🤖 AI Summary
Current large speech-language models (Speech-LLMs) exhibit limited empathic reasoning capabilities, primarily due to the scarcity of high-quality training data that jointly encodes semantic content and paralinguistic cues (e.g., prosody, rhythm). To address this, we propose a dual-path approach: (1) an explicit path that incorporates paralinguistic metadata—comprising categorical emotion labels and dimensional affect annotations—to construct a structured training dataset; and (2) an implicit path that leverages LLMs to synthesize context-aware, emotion-labeled question-answer pairs. Experimental results show that the implicit strategy alone improves empathic reasoning performance by 38.41%, while combining both paths yields a 46.02% gain. Furthermore, LLM-based automatic evaluation strongly correlates with human annotation (Spearman’s ρ > 0.92), confirming method validity and reliability. This work constitutes the first systematic integration of explicit and implicit paralinguistic modeling for Speech-LLMs, substantially advancing their contextualized empathic understanding.
📝 Abstract
Current large speech language models (Speech-LLMs) often exhibit limitations in empathetic reasoning, primarily due to the absence of training datasets that integrate both contextual content and paralinguistic cues. In this work, we propose two approaches to incorporate contextual paralinguistic information into model training: (1) an explicit method that provides paralinguistic metadata (e.g., emotion annotations) directly to the LLM, and (2) an implicit method that automatically generates novel training question-answer (QA) pairs using both categorical and dimensional emotion annotations alongside speech transcriptions. Our implicit method boosts performance (LLM-judged) by 38.41% on a human-annotated QA benchmark, reaching 46.02% when combined with the explicit approach, showing effectiveness in contextual paralinguistic understanding. We also validate the LLM judge by demonstrating its correlation with classification metrics, providing support for its reliability.