User Experience Estimation in Human-Robot Interaction Via Multi-Instance Learning of Multimodal Social Signals

📅 2025-07-31

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

Current UX evaluation in human-robot interaction (HRI) suffers from fragmentation and staticity, failing to capture temporal dynamics of user experience. To address this, we propose a dynamic UX estimation method grounded in multimodal social signals—specifically facial expressions and speech—integrated within an end-to-end framework that jointly leverages multiple-instance learning and Transformer architectures. This is the first approach to explicitly model UX fluctuations at both short-term (e.g., second-scale emotional shifts) and long-term (e.g., session-level adaptation) temporal scales, thereby overcoming the limitations of conventional single-time-point assessments. Empirical evaluation demonstrates statistically significant improvements in UX prediction accuracy over human annotators (p < 0.01). The method provides a deployable technical foundation for social robots to achieve fine-grained, real-time user state perception and adaptive behavioral modulation.

Technology Category

Application Category

📝 Abstract

In recent years, the demand for social robots has grown, requiring them to adapt their behaviors based on users' states. Accurately assessing user experience (UX) in human-robot interaction (HRI) is crucial for achieving this adaptability. UX is a multi-faceted measure encompassing aspects such as sentiment and engagement, yet existing methods often focus on these individually. This study proposes a UX estimation method for HRI by leveraging multimodal social signals. We construct a UX dataset and develop a Transformer-based model that utilizes facial expressions and voice for estimation. Unlike conventional models that rely on momentary observations, our approach captures both short- and long-term interaction patterns using a multi-instance learning framework. This enables the model to capture temporal dynamics in UX, providing a more holistic representation. Experimental results demonstrate that our method outperforms third-party human evaluators in UX estimation.

Problem

Research questions and friction points this paper is trying to address.

Estimating user experience in human-robot interaction via multimodal signals

Capturing temporal dynamics of UX using multi-instance learning

Improving UX assessment accuracy beyond human evaluators

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based model using facial and voice signals

Multi-instance learning for temporal UX dynamics

Outperforms human evaluators in UX estimation

🔎 Similar Papers

No similar papers found.