🤖 AI Summary
This study addresses the challenge of accurately identifying users’ subjective self-disclosure behaviors in social robotics. Methodologically, we (1) construct the first large-scale video corpus specifically designed for self-disclosure recognition; (2) propose a scale-preserving cross-entropy loss function to jointly optimize both classification (disclosure type) and regression (disclosure intensity) tasks; and (3) integrate visual, acoustic, and textual modalities within an end-to-end framework enhanced by hierarchical attention mechanisms. Experimental results demonstrate that our best-performing model achieves an F1-score of 0.83—representing a substantial 0.48 improvement over baseline methods—and significantly enhances robots’ fine-grained perception of human affective disclosure. Key contributions include: (i) the first dedicated multimodal dataset for self-disclosure recognition; (ii) a novel unified loss function enabling joint classification-regression learning; and (iii) an interpretable, end-to-end multimodal neural architecture for self-disclosure analysis.
📝 Abstract
Subjective self-disclosure is an important feature of human social interaction. While much has been done in the social and behavioural literature to characterise the features and consequences of subjective self-disclosure, little work has been done thus far to develop computational systems that are able to accurately model it. Even less work has been done that attempts to model specifically how human interactants self-disclose with robotic partners. It is becoming more pressing as we require social robots to work in conjunction with and establish relationships with humans in various social settings. In this paper, our aim is to develop a custom multimodal attention network based on models from the emotion recognition literature, training this model on a large self-collected self-disclosure video corpus, and constructing a new loss function, the scale preserving cross entropy loss, that improves upon both classification and regression versions of this problem. Our results show that the best performing model, trained with our novel loss function, achieves an F1 score of 0.83, an improvement of 0.48 from the best baseline model. This result makes significant headway in the aim of allowing social robots to pick up on an interaction partner's self-disclosures, an ability that will be essential in social robots with social cognition.