Tabular foundation model to detect empathy from visual cues

📅 2025-04-15

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This study addresses video-based empathy recognition in privacy-sensitive settings, where only pre-extracted visual features (in tabular format) are available and raw videos are inaccessible. We pioneer the application of tabular foundation models—TabPFN v2 and TabICL—to cross-subject empathy detection under such constraints. To ensure ecological validity, we propose an individual-generalization evaluation framework grounded in strict cross-subject validation protocols, leveraging both in-context learning and fine-tuning strategies. On a human–computer interaction benchmark, our approach achieves a cross-subject accuracy of 0.730 (+14.0 percentage points) and AUC of 0.669 (+10.5 percentage points), significantly outperforming strong baselines including conventional tree-based models. Our key contributions are threefold: (1) the first adaptation of tabular foundation models to computational empathy; (2) the design of a subject-generalizable evaluation paradigm aligned with real-world deployment requirements; and (3) a practical, privacy-compliant pathway for multimodal affective understanding under data-restriction constraints.

Technology Category

Application Category

📝 Abstract

Detecting empathy from video interactions is an emerging area of research. Video datasets, however, are often released as extracted features (i.e., tabular data) rather than raw footage due to privacy and ethical concerns. Prior research on such tabular datasets established tree-based classical machine learning approaches as the best-performing models. Motivated by the recent success of textual foundation models (i.e., large language models), we explore the use of tabular foundation models in empathy detection from tabular visual features. We experiment with two recent tabular foundation models $-$ TabPFN v2 and TabICL $-$ through in-context learning and fine-tuning setups. Our experiments on a public human-robot interaction benchmark demonstrate a significant boost in cross-subject empathy detection accuracy over several strong baselines (accuracy: $0.590 ightarrow 0.730$; AUC: $0.564 ightarrow 0.669$). In addition to performance improvement, we contribute novel insights and an evaluation setup to ensure generalisation on unseen subjects in this public benchmark. As the practice of releasing video features as tabular datasets is likely to persist due to privacy constraints, our findings will be widely applicable to future empathy detection video datasets as well.

Problem

Research questions and friction points this paper is trying to address.

Detect empathy from tabular visual features

Improve cross-subject empathy detection accuracy

Apply tabular foundation models to video datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Tabular foundation models for empathy detection

In-context learning and fine-tuning setups

Improved cross-subject generalization accuracy

🔎 Similar Papers

Empathy Detection from Text, Audiovisual, Audio or Physiological Signals: A Systematic Review of Task Formulations and Machine Learning Methods

2023-10-30Citations: 0

Contextual Emotion Recognition using Large Vision Language Models

2024-05-14IEEE/RJS International Conference on Intelligent RObots and SystemsCitations: 2