A Unified Spoken Language Model with Injected Emotional-Attribution Thinking for Human-like Interaction

📅 2026-01-08

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses the shallow modeling of emotion understanding in spoken dialogue systems by proposing an Injective Emotion Attribution Thinking (IEAT) mechanism, which enables models to implicitly capture user emotions and their underlying causes during internal reasoning without relying on explicit supervision. Through a two-stage progressive training framework that integrates speech-text alignment, self-distillation, and cross-modal joint optimization, the approach achieves end-to-end modeling of emotional consistency and generation of empathetic responses. Evaluated on the HumDial benchmark, the method achieves state-of-the-art performance across three key tasks: emotional trajectory modeling, emotion attribution reasoning, and empathetic response generation, demonstrating superior results in both large language model–based and human evaluations.

Technology Category

Application Category

📝 Abstract

This paper presents a unified spoken language model for emotional intelligence, enhanced by a novel data construction strategy termed Injected Emotional-Attribution Thinking (IEAT). IEAT incorporates user emotional states and their underlying causes into the model's internal reasoning process, enabling emotion-aware reasoning to be internalized rather than treated as explicit supervision. The model is trained with a two-stage progressive strategy. The first stage performs speech-text alignment and emotional attribute modeling via self-distillation, while the second stage conducts end-to-end cross-modal joint optimization to ensure consistency between textual and spoken emotional expressions. Experiments on the Human-like Spoken Dialogue Systems Challenge (HumDial) Emotional Intelligence benchmark demonstrate that the proposed approach achieves top-ranked performance across emotional trajectory modeling, emotional reasoning, and empathetic response generation under both LLM-based and human evaluations.

Problem

Research questions and friction points this paper is trying to address.

spoken language model

emotional intelligence

empathetic response generation

emotion-aware reasoning

human-like interaction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Injected Emotional-Attribution Thinking

emotion-aware reasoning

spoken language model