Personalized Socially Assistive Robots With End-to-End Speech-Language Models For Well-Being Support

📅 2025-07-18

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

Current social assistive robots (SARs) exhibit significant limitations in real-time responsiveness, empathic feedback generation, and personalized spoken interaction, hindering their efficacy in mental health support. To address this, we propose the first integration of an end-to-end spoken language model (SLM) into an SAR architecture, enabling a low-latency, natural turn-taking, and emotion-adaptive spoken dialogue framework that unifies real-time speech understanding, generative empathic response generation, and anthropomorphic speech synthesis. User studies demonstrate statistically significant improvements in perceived conversational naturalness and empathy (p < 0.01); however, nonverbal behavioral synchronization and lexical diversity require further refinement. This work establishes a scalable, SLM-driven paradigm for SAR design in mental healthcare, advancing the deployment of embodied, affective human–robot spoken interaction.

Technology Category

Application Category

📝 Abstract

Socially assistive robots (SARs) have shown great potential for supplementing well-being support. However, prior studies have found that existing dialogue pipelines for SARs remain limited in real-time latency, back-channeling, and personalized speech dialogue. Toward addressing these limitations, we propose using integrated end-to-end speech-language models (SLMs) with SARs. This work 1) evaluated the usability of an SLM-enabled SAR dialogue system through a small user study, and 2) identified remaining limitations through study user feedback to inform future improvements. We conducted a small within-participant user study with university students (N = 11) whose results showed that participants perceived an SLM-enabled SAR system as capable of providing empathetic feedback, natural turn-taking, back-channeling, and adaptive responses. We also found that participants reported the robot's nonverbal behaviors as lacking variability and synchronization with conversation, and the SLM's verbal feedback as generic and repetitive. These findings highlighted the need for real-time robot movement synchronized with conversation, improved prompting or fine-tuning to generate outputs better aligned with mental health practices, and more expressive, adaptive vocal generation.

Problem

Research questions and friction points this paper is trying to address.

Improving real-time latency in SAR dialogue systems

Enhancing personalized speech dialogue for SARs

Synchronizing nonverbal behaviors with SLM conversations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrated end-to-end speech-language models for SARs

SLM-enabled SAR system for empathetic feedback

Real-time robot movement synchronized with conversation

🔎 Similar Papers

No similar papers found.