🤖 AI Summary
Empathic response detection faces challenges including ill-defined task formulations, fragmented multimodal modeling, and the absence of systematic surveys. This paper conducts a cross-modal systematic review of 62 high-quality studies. Methodologically, it unifies network architecture design principles along modality dimensions for the first time, proposes a standardized three-tier interaction framework (individual, dyadic, and group) and a five-category task taxonomy—including local/global empathy recognition and emotional contagion detection. It constructs the first structured empathy detection knowledge graph, integrating four input modalities (text, audio, audiovisual, and physiological signals), twelve publicly available datasets, and seven reproducible codebases. By synergizing NLP, audiovisual modeling, speech emotion analysis, and time-frequency physiological signal processing—augmented with cross-modal contrastive learning and meta-analysis—the study identifies critical research gaps, establishing both theoretical foundations and practical guidelines for robust empathic computing.
📝 Abstract
Empathy indicates an individual's ability to understand others. Over the past few years, empathy has drawn attention from various disciplines, including but not limited to Affective Computing, Cognitive Science, and Psychology. Detecting empathy has potential applications in society, healthcare and education. Despite being a broad and overlapping topic, the avenue of empathy detection leveraging Machine Learning remains underexplored from a systematic literature review perspective. We collected 829 papers from 10 well-known databases, systematically screened them and analysed the final 62 papers. Our analyses reveal several prominent task formulations $-$ including empathy on localised utterances or overall expressions, unidirectional or parallel empathy, and emotional contagion $-$ in monadic, dyadic and group interactions. Empathy detection methods are summarised based on four input modalities $-$ text, audiovisual, audio and physiological signals $-$ thereby presenting modality-specific network architecture design protocols. We discuss challenges, research gaps and potential applications in the Affective Computing-based empathy domain, which can facilitate new avenues of exploration. We further enlist the public availability of datasets and codes. This paper, therefore, provides a structured overview of recent advancements and remaining challenges towards developing a robust empathy detection system that could meaningfully contribute to enhancing human well-being.