EmoS: A High-Fidelity Multimodal Benchmark for Fine-grained Streaming Emotional Understanding

📅 2026-05-09

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing affective understanding benchmarks suffer from limitations in ecological validity, signal clarity, and reliability of fine-grained annotations, hindering the training and evaluation of empathetic models. This work proposes EmoS—a high-fidelity bilingual multimodal affective benchmark—that uniquely integrates rigorously curated static clips with dynamic streaming monologues. To reconcile ecological validity with signal quality, EmoS introduces a dual-layer human annotation protocol and a streaming affect annotation framework. Multimodal large language models fine-tuned on EmoS significantly outperform zero-shot baselines, demonstrating the benchmark’s effectiveness in supporting fine-grained, continuous affect modeling. The dataset and code are publicly released.

📝 Abstract

In the context of today's high-pressure, aging society, the demand for large-scale emotional models capable of providing empathetic support is more critical than ever. However, existing benchmarks fail to simultaneously achieve ecological validity, signal clarity, and reliable fine-grained labeling. We introduce EmoS, a high-fidelity bilingual benchmark designed to resolve the limitations of ecological validity and noise in existing datasets by combining strictly filtered static slices with a dynamic Streaming Monologue subset. Supported by a rigorous dual-layer human annotation pipeline, EmoS provides trusted ground truth that captures continuous emotional evolution. Empirical results show that fine-tuning MLLMs (multimodal large language models) on EmoS yields significant gains over zero-shot baselines, laying the foundation for the training and evaluation of future emotion recognition models and empathy models. The dataset and code are publicly available at https://github.com/NLP2CT/EmoS.

Problem

Research questions and friction points this paper is trying to address.

emotional understanding

multimodal benchmark

fine-grained labeling

ecological validity

streaming emotion

Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal benchmark

fine-grained emotion recognition

streaming emotional understanding

dual-layer annotation

high-fidelity dataset

🔎 Similar Papers

OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition

2024-10-02Citations: 0