EmoS: A High-Fidelity Multimodal Benchmark for Fine-grained Streaming Emotional Understanding

📅 2026-05-09
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
Existing affective understanding benchmarks suffer from limitations in ecological validity, signal clarity, and reliability of fine-grained annotations, hindering the training and evaluation of empathetic models. This work proposes EmoS—a high-fidelity bilingual multimodal affective benchmark—that uniquely integrates rigorously curated static clips with dynamic streaming monologues. To reconcile ecological validity with signal quality, EmoS introduces a dual-layer human annotation protocol and a streaming affect annotation framework. Multimodal large language models fine-tuned on EmoS significantly outperform zero-shot baselines, demonstrating the benchmark’s effectiveness in supporting fine-grained, continuous affect modeling. The dataset and code are publicly released.
📝 Abstract
In the context of today's high-pressure, aging society, the demand for large-scale emotional models capable of providing empathetic support is more critical than ever. However, existing benchmarks fail to simultaneously achieve ecological validity, signal clarity, and reliable fine-grained labeling. We introduce EmoS, a high-fidelity bilingual benchmark designed to resolve the limitations of ecological validity and noise in existing datasets by combining strictly filtered static slices with a dynamic Streaming Monologue subset. Supported by a rigorous dual-layer human annotation pipeline, EmoS provides trusted ground truth that captures continuous emotional evolution. Empirical results show that fine-tuning MLLMs (multimodal large language models) on EmoS yields significant gains over zero-shot baselines, laying the foundation for the training and evaluation of future emotion recognition models and empathy models. The dataset and code are publicly available at https://github.com/NLP2CT/EmoS.
Problem

Research questions and friction points this paper is trying to address.

emotional understanding
multimodal benchmark
fine-grained labeling
ecological validity
streaming emotion
Innovation

Methods, ideas, or system contributions that make the work stand out.

multimodal benchmark
fine-grained emotion recognition
streaming emotional understanding
dual-layer annotation
high-fidelity dataset