EmoOmni: Bridging Emotional Understanding and Expression in Omni-Modal LLMs

📅 2026-02-25

📈 Citations: 0

✨ Influential: 0

career value

200K/year

🤖 AI Summary

This work addresses the limitation of existing fully multimodal large language models, whose implicit Thinker-Talker architectures struggle to accurately capture contextual emotions, often resulting in distorted affective responses. To overcome this, the authors propose EmoOmni, a unified framework that introduces, for the first time, an explicit Emotional Chain-of-Thought (E-CoT) mechanism. This enables end-to-end emotional reasoning—from fine-grained multimodal perception to text generation—and explicitly guides the dialogue module via E-CoT as a high-level instruction. Concurrently, the study establishes a real-world multimodal emotional dialogue data pipeline and a dedicated evaluation benchmark, EmoOmniEval. Experimental results demonstrate that EmoOmni-7B achieves emotional dialogue performance on par with Qwen3Omni-30B-A3B-Thinking under identical Talker conditions.

Technology Category

Application Category

📝 Abstract

The evolution of Omni-Modal Large Language Models~(Omni-LLMs) has revolutionized human--computer interaction, enabling unified audio-visual perception and speech response. However, existing Omni-LLMs struggle with complex real-world scenarios, often leading to superficial understanding and contextually mismatched emotional responses. This issue is further intensified by Omni-LLM's Thinker-Talker architectures, which are implicitly connected through hidden states, leading to the loss of emotional details. In this work, we present EmoOmni, a unified framework for accurate understanding and expression in multimodal emotional dialogue. At its core, we introduce the emotional Chain-of-Thought~(E-CoT), which enforces a reasoning from fine-grained multimodal perception to textual response. Moreover, we explicitly treat E-CoT as high-level emotional instructions that guide the talker, enabling accurate emotional expression. Complementing the model, we construct EmoOmniPipe to obtain the real-world annotated dialogue data and establish a benchmark, EmoOmniEval, to facilitate systematic assessment of multimodal emotional dialogue task. Experiments show that EmoOmni-7B achieves comparable performance with Qwen3Omni-30B-A3B-Thinking under the same talker.

Problem

Research questions and friction points this paper is trying to address.

Emotional Understanding

Emotional Expression

Omni-Modal LLMs

Multimodal Dialogue

Emotion Mismatch

Innovation

Methods, ideas, or system contributions that make the work stand out.

Emotional Chain-of-Thought

Omni-Modal LLMs

Emotion-aware Dialogue