DMER-Ranker: Learning to Rank Emotion Descriptions in the Absence of Ground Truth

📅 2025-07-06

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Existing descriptive multimodal emotion recognition (DMER) evaluation relies either on costly human-annotated natural language descriptions or degenerates into coarse-grained classification, thereby losing critical affective dimensions—including temporal dynamics, intensity, and uncertainty. To address this, we propose the first ground-truth-free DMER evaluation paradigm: a pairwise ranking-based assessment framework. Our approach introduces three key contributions: (1) DMER-Preference, the first emotion preference dataset for evaluating descriptive quality; (2) a Bradley–Terry model to formally capture human preferences over emotion descriptions; and (3) an end-to-end automatic preference prediction method enabling scalable, annotation-free evaluation. By eliminating dependence on reference descriptions, our framework significantly improves evaluation efficiency and scalability while preserving fine-grained affective semantics. It establishes a robust, interpretable foundation for advanced emotion understanding and human–machine interaction applications.

Technology Category

Application Category

📝 Abstract

Descriptive Multimodal Emotion Recognition (DMER) is a newly proposed task that aims to describe a person's emotional state using free-form natural language. Unlike traditional discriminative methods that rely on predefined emotion taxonomies, DMER provides greater flexibility in emotional expression, enabling fine-grained and interpretable emotion representations. However, this free-form prediction paradigm introduces significant challenges in evaluation. Existing methods either depend on ground-truth descriptions that require substantial manual effort or simplify the task by shifting the focus from evaluating descriptions to evaluating emotion labels. However, the former suffers from the labor-intensive collection of comprehensive descriptions, while the latter overlooks critical aspects such as emotional temporal dynamics, intensity, and uncertainty. To address these limitations, we propose DMER-Ranker, a novel evaluation strategy that reformulates the traditional ``prediction-ground truth'' comparison into the ``prediction-prediction'' comparison, eliminating the need for ground-truth descriptions. We then employ the Bradley-Terry algorithm to convert pairwise comparison results into model-level rankings. Additionally, we explore the possibility of automatic preference prediction and introduce DMER-Preference, the first preference dataset specifically designed for human emotions. Our work advances the field of DMER and lays the foundation for more intelligent human-computer interaction systems.

Problem

Research questions and friction points this paper is trying to address.

Evaluating free-form emotion descriptions without ground truth

Overcoming labor-intensive manual description collection

Addressing emotional dynamics, intensity, and uncertainty in evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Replaces ground-truth with prediction-prediction comparison

Uses Bradley-Terry algorithm for model rankings

Introduces DMER-Preference dataset for emotion preferences

🔎 Similar Papers

OV-MER: Towards Open-Vocabulary Multimodal Emotion Recognition