Emotive Speech-to-Text Interfaces in XR: A Narrative Review of Psychophysiological and Accessibility Advances

📅 2024-05-22

📈 Citations: 1

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This paper addresses the lack of affective expressivity in speech-to-text (STT) systems for extended reality (XR), which impedes communication accessibility and empathic engagement for deaf and hard-of-hearing users. We systematically review advances from 2020–2024 and propose the first integrated STT enhancement framework incorporating real-time emotion recognition, multimodal physiological signal analysis (electrodermal activity, heart rate), AR/VR-based affective rendering, and animated/emoticon-augmented captions. By unifying affective computing, low-latency ASR, and expressive visualization techniques, our framework advances STT beyond semantic transcription toward emotionally aware transcription. We identify five core research themes and trace their technical evolution, culminating in a deployable design paradigm for affect-enhanced captioning. The work provides both theoretical foundations and practical guidelines for empathic human–machine interaction in high-stakes domains such as education and healthcare.

Technology Category

Application Category

📝 Abstract

This narrative review on emotional expression in Speech-to-Text (STT) interfaces with Extended Reality (XR) aims to identify advancements, limitations, and research gaps in incorporating emotional expression into transcribed text generated by STT systems. Using a rigorous search strategy, relevant articles published between 2020 and 2024 are extracted and categorized into themes such as communication enhancement technologies, innovations in captioning, visual and affective augmentation, emotion recognition in AR and VR, and empathic machines. The findings reveal the evolution of tools and techniques to meet the needs of individuals with hearing impairments, showcasing innovations in live transcription, closed captioning, AR, VR, and emotion recognition technologies. Despite improvements in accessibility, the absence of emotional nuance in transcribed text remains a significant communication challenge. The study underscores the urgency for innovations in STT technology to capture emotional expressions. The research discusses integrating emotional expression into text through strategies like animated text captions, emojilization tools, and models associating emotions with animation properties. Extending these efforts into AR and VR environments opens new possibilities for immersive and emotionally resonant experiences, especially in educational contexts. The study also explores empathic applications in healthcare, education, and human-robot interactions, highlighting the potential for personalized and effective interactions. The multidisciplinary nature of the literature underscores the potential for collaborative and interdisciplinary research.

Problem

Research questions and friction points this paper is trying to address.

Incorporating emotional expression into STT transcribed text

Addressing emotional nuance absence in accessibility technologies

Exploring emotion recognition in AR/VR for immersive experiences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Emotion recognition in AR and VR

Animated text captions for emotion

Empathic applications in healthcare

🔎 Similar Papers

EmBARDiment: an Embodied AI Agent for Productivity in XR