š¤ AI Summary
This study addresses the challenge of recognizing subtle emotional responsesāparticularly to auditory stimuli such as name-callingāin children with Autism Spectrum Disorder (ASD) during humanārobot interaction (HRI) with the NAO robot.
Method: We propose an end-to-end visionāgeometry joint modeling framework: (i) a novel architecture integrating ResNet-50 and a three-layer Graph Convolutional Network (GCN), leveraging MediaPipe FaceMesh facial landmarks and KL-divergenceādriven embedding fusion; and (ii) dual-model weighted soft labeling using DeepFace and FER for seven-class probabilistic emotion annotation.
Contribution/Results: We introduce the first large-scale, real-world facial dataset of ASD children interacting with robots in India (50,000 frames, 15 participants), filling a critical gap in neurodiverse HRI affective data. Our method achieves state-of-the-art performance on fine-grained seven-class micro-expression classification, significantly improving robustness and interpretabilityāenabling clinically deployable, human-in-the-loop therapeutic interventions.
š Abstract
Understanding emotional responses in children with Autism Spectrum Disorder (ASD) during social interaction remains a critical challenge in both developmental psychology and human-robot interaction. This study presents a novel deep learning pipeline for emotion recognition in autistic children in response to a name-calling event by a humanoid robot (NAO), under controlled experimental settings. The dataset comprises of around 50,000 facial frames extracted from video recordings of 15 children with ASD. A hybrid model combining a fine-tuned ResNet-50-based Convolutional Neural Network (CNN) and a three-layer Graph Convolutional Network (GCN) trained on both visual and geometric features extracted from MediaPipe FaceMesh landmarks. Emotions were probabilistically labeled using a weighted ensemble of two models: DeepFace's and FER, each contributing to soft-label generation across seven emotion classes. Final classification leveraged a fused embedding optimized via Kullback-Leibler divergence. The proposed method demonstrates robust performance in modeling subtle affective responses and offers significant promise for affective profiling of ASD children in clinical and therapeutic human-robot interaction contexts, as the pipeline effectively captures micro emotional cues in neurodivergent children, addressing a major gap in autism-specific HRI research. This work represents the first such large-scale, real-world dataset and pipeline from India on autism-focused emotion analysis using social robotics, contributing an essential foundation for future personalized assistive technologies.