A Multimodal Neural Network for Recognizing Subjective Self-Disclosure Towards Social Robots

📅 2025-08-14

📈 Citations: 0

✨ Influential: 0

career value

249K/year

🤖 AI Summary

This study addresses the challenge of accurately identifying users’ subjective self-disclosure behaviors in social robotics. Methodologically, we (1) construct the first large-scale video corpus specifically designed for self-disclosure recognition; (2) propose a scale-preserving cross-entropy loss function to jointly optimize both classification (disclosure type) and regression (disclosure intensity) tasks; and (3) integrate visual, acoustic, and textual modalities within an end-to-end framework enhanced by hierarchical attention mechanisms. Experimental results demonstrate that our best-performing model achieves an F1-score of 0.83—representing a substantial 0.48 improvement over baseline methods—and significantly enhances robots’ fine-grained perception of human affective disclosure. Key contributions include: (i) the first dedicated multimodal dataset for self-disclosure recognition; (ii) a novel unified loss function enabling joint classification-regression learning; and (iii) an interpretable, end-to-end multimodal neural architecture for self-disclosure analysis.

Technology Category

Application Category

📝 Abstract

Subjective self-disclosure is an important feature of human social interaction. While much has been done in the social and behavioural literature to characterise the features and consequences of subjective self-disclosure, little work has been done thus far to develop computational systems that are able to accurately model it. Even less work has been done that attempts to model specifically how human interactants self-disclose with robotic partners. It is becoming more pressing as we require social robots to work in conjunction with and establish relationships with humans in various social settings. In this paper, our aim is to develop a custom multimodal attention network based on models from the emotion recognition literature, training this model on a large self-collected self-disclosure video corpus, and constructing a new loss function, the scale preserving cross entropy loss, that improves upon both classification and regression versions of this problem. Our results show that the best performing model, trained with our novel loss function, achieves an F1 score of 0.83, an improvement of 0.48 from the best baseline model. This result makes significant headway in the aim of allowing social robots to pick up on an interaction partner's self-disclosures, an ability that will be essential in social robots with social cognition.

Problem

Research questions and friction points this paper is trying to address.

Modeling human self-disclosure with robotic partners

Developing multimodal neural networks for emotion recognition

Improving classification and regression in social robot interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal attention network for self-disclosure recognition

Custom loss function improves classification and regression

Large self-collected video corpus for model training

🔎 Similar Papers

No similar papers found.

Amazon

193,300.00 - 261,500.00 USD annually

USA, CA, San Francisco

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)