A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction

πŸ“… 2026-01-08
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the shallow modeling of emotion understanding in spoken dialogue systems by proposing an Injective Emotion Attribution Thinking (IEAT) mechanism, which enables models to implicitly capture user emotions and their underlying causes during internal reasoning without relying on explicit supervision. Through a two-stage progressive training framework that integrates speech-text alignment, self-distillation, and cross-modal joint optimization, the approach achieves end-to-end modeling of emotional consistency and generation of empathetic responses. Evaluated on the HumDial benchmark, the method achieves state-of-the-art performance across three key tasks: emotional trajectory modeling, emotion attribution reasoning, and empathetic response generation, demonstrating superior results in both large language model–based and human evaluations.

Technology Category

Application Category

πŸ“ Abstract
This paper presents a unified spoken language model for emotional intelligence, enhanced by a novel data construction strategy termed Injected Emotional-Attribution Thinking (IEAT). IEAT incorporates user emotional states and their underlying causes into the model's internal reasoning process, enabling emotion-aware reasoning to be internalized rather than treated as explicit supervision. The model is trained with a two-stage progressive strategy. The first stage performs speech-text alignment and emotional attribute modeling via self-distillation, while the second stage conducts end-to-end cross-modal joint optimization to ensure consistency between textual and spoken emotional expressions. Experiments on the Human-like Spoken Dialogue Systems Challenge (HumDial) Emotional Intelligence benchmark demonstrate that the proposed approach achieves top-ranked performance across emotional trajectory modeling, emotional reasoning, and empathetic response generation under both LLM-based and human evaluations.
Problem

Research questions and friction points this paper is trying to address.

spoken language model
emotional intelligence
empathetic response generation
emotion-aware reasoning
human-like interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Injected Emotional-Attribution Thinking
emotion-aware reasoning
spoken language model
cross-modal joint optimization
self-distillation
πŸ”Ž Similar Papers
No similar papers found.