Mixed Signals: Understanding Model Disagreement in Multimodal Empathy Detection

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work investigates the causes and implications of model prediction disagreement arising from modality conflicts in multimodal empathy detection. To address performance degradation in fusion models caused by inconsistencies across textual, acoustic, and visual signals, we propose treating model disagreement as a diagnostic indicator of semantic ambiguity. Our method introduces fine-tuned unimodal baselines and a gated multimodal fusion model, integrated with disagreement analysis and human annotator consistency evaluation. Key contributions include: (1) identification that a dominant modality lacking cross-modal support can mislead fusion decisions; (2) empirical evidence that model disagreement strongly correlates with annotator uncertainty, enabling effective detection of system-fragile samples; and (3) demonstration that humans do not consistently benefit from multimodal inputs—validating the inherent ambiguity of emotional expression. These findings establish a novel paradigm for robust empathy modeling and uncertainty-aware inference.

Technology Category

Application Category

📝 Abstract

Multimodal models play a key role in empathy detection, but their performance can suffer when modalities provide conflicting cues. To understand these failures, we examine cases where unimodal and multimodal predictions diverge. Using fine-tuned models for text, audio, and video, along with a gated fusion model, we find that such disagreements often reflect underlying ambiguity, as evidenced by annotator uncertainty. Our analysis shows that dominant signals in one modality can mislead fusion when unsupported by others. We also observe that humans, like models, do not consistently benefit from multimodal input. These insights position disagreement as a useful diagnostic signal for identifying challenging examples and improving empathy system robustness.

Problem

Research questions and friction points this paper is trying to address.

Understanding model disagreement in multimodal empathy detection

Analyzing conflicting cues from text, audio, and video modalities

Improving empathy system robustness using disagreement as diagnostic signal

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned unimodal models for text, audio, video

Gated fusion model to handle conflicting cues

Disagreement as diagnostic signal for robustness

🔎 Similar Papers

Empathy Detection from Text, Audiovisual, Audio or Physiological Signals: A Systematic Review of Task Formulations and Machine Learning Methods