Personalized Socially Assistive Robots With End-to-End Speech-Language Models For Well-Being Support

📅 2025-07-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current social assistive robots (SARs) exhibit significant limitations in real-time responsiveness, empathic feedback generation, and personalized spoken interaction, hindering their efficacy in mental health support. To address this, we propose the first integration of an end-to-end spoken language model (SLM) into an SAR architecture, enabling a low-latency, natural turn-taking, and emotion-adaptive spoken dialogue framework that unifies real-time speech understanding, generative empathic response generation, and anthropomorphic speech synthesis. User studies demonstrate statistically significant improvements in perceived conversational naturalness and empathy (p < 0.01); however, nonverbal behavioral synchronization and lexical diversity require further refinement. This work establishes a scalable, SLM-driven paradigm for SAR design in mental healthcare, advancing the deployment of embodied, affective human–robot spoken interaction.

Technology Category

Application Category

📝 Abstract
Socially assistive robots (SARs) have shown great potential for supplementing well-being support. However, prior studies have found that existing dialogue pipelines for SARs remain limited in real-time latency, back-channeling, and personalized speech dialogue. Toward addressing these limitations, we propose using integrated end-to-end speech-language models (SLMs) with SARs. This work 1) evaluated the usability of an SLM-enabled SAR dialogue system through a small user study, and 2) identified remaining limitations through study user feedback to inform future improvements. We conducted a small within-participant user study with university students (N = 11) whose results showed that participants perceived an SLM-enabled SAR system as capable of providing empathetic feedback, natural turn-taking, back-channeling, and adaptive responses. We also found that participants reported the robot's nonverbal behaviors as lacking variability and synchronization with conversation, and the SLM's verbal feedback as generic and repetitive. These findings highlighted the need for real-time robot movement synchronized with conversation, improved prompting or fine-tuning to generate outputs better aligned with mental health practices, and more expressive, adaptive vocal generation.
Problem

Research questions and friction points this paper is trying to address.

Improving real-time latency in SAR dialogue systems
Enhancing personalized speech dialogue for SARs
Synchronizing nonverbal behaviors with SLM conversations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrated end-to-end speech-language models for SARs
SLM-enabled SAR system for empathetic feedback
Real-time robot movement synchronized with conversation
🔎 Similar Papers
No similar papers found.
Mengxue Fu
Mengxue Fu
University of Southern California
Artificial IntelligenceRobotics
Z
Zhonghao Shi
Department of Computer Science, University of Southern California, Los Angeles, CA, USA
M
Minyu Huang
Department of Computer Science, University of Southern California, Los Angeles, CA, USA
S
Siqi Liu
Department of Computer Science, University of Southern California, Los Angeles, CA, USA
Mina Kian
Mina Kian
University of Southern California
Socially Assistive RoboticsHuman-Robot InteractionNatural Language Processing
Y
Yirui Song
Department of Computer Science, University of Southern California, Los Angeles, CA, USA
M
Maja J. Matarić
Department of Computer Science, University of Southern California, Los Angeles, CA, USA