π€ AI Summary
This paper addresses the lack of affective expressivity in speech-to-text (STT) systems for extended reality (XR), which impedes communication accessibility and empathic engagement for deaf and hard-of-hearing users. We systematically review advances from 2020β2024 and propose the first integrated STT enhancement framework incorporating real-time emotion recognition, multimodal physiological signal analysis (electrodermal activity, heart rate), AR/VR-based affective rendering, and animated/emoticon-augmented captions. By unifying affective computing, low-latency ASR, and expressive visualization techniques, our framework advances STT beyond semantic transcription toward emotionally aware transcription. We identify five core research themes and trace their technical evolution, culminating in a deployable design paradigm for affect-enhanced captioning. The work provides both theoretical foundations and practical guidelines for empathic humanβmachine interaction in high-stakes domains such as education and healthcare.
π Abstract
This narrative review on emotional expression in Speech-to-Text (STT) interfaces with Extended Reality (XR) aims to identify advancements, limitations, and research gaps in incorporating emotional expression into transcribed text generated by STT systems. Using a rigorous search strategy, relevant articles published between 2020 and 2024 are extracted and categorized into themes such as communication enhancement technologies, innovations in captioning, visual and affective augmentation, emotion recognition in AR and VR, and empathic machines. The findings reveal the evolution of tools and techniques to meet the needs of individuals with hearing impairments, showcasing innovations in live transcription, closed captioning, AR, VR, and emotion recognition technologies. Despite improvements in accessibility, the absence of emotional nuance in transcribed text remains a significant communication challenge. The study underscores the urgency for innovations in STT technology to capture emotional expressions. The research discusses integrating emotional expression into text through strategies like animated text captions, emojilization tools, and models associating emotions with animation properties. Extending these efforts into AR and VR environments opens new possibilities for immersive and emotionally resonant experiences, especially in educational contexts. The study also explores empathic applications in healthcare, education, and human-robot interactions, highlighting the potential for personalized and effective interactions. The multidisciplinary nature of the literature underscores the potential for collaborative and interdisciplinary research.