Advancing User-Voice Interaction: Exploring Emotion-Aware Voice Assistants Through a Role-Swapping Approach

📅 2025-02-21

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This study addresses the limited empathic capability of emotion-aware voice assistants in responding to users’ negative affect. To overcome this limitation, we propose and empirically validate a role-reversal experimental paradigm wherein users actively regulate the AI’s emotional state rather than passively receiving preconfigured responses. Integrating speech emotion recognition (SER), acoustic feature analysis (RMS, ZCR, jitter), and natural language processing (sentiment polarity, type-token ratio), we find that users consistently respond to negative vocal inputs with neutral or positive prosody—revealing an implicit emotion-downregulation strategy. Crucially, this response pattern significantly enhances user trust and interaction comfort. Our work introduces the first dual-modal emotion-response modeling framework that jointly accounts for acoustic sensitivity and linguistic appropriateness. It provides empirical grounding and transferable design principles for developing culturally aware, context-adaptive empathic voice assistants.

Technology Category

Application Category

📝 Abstract

As voice assistants (VAs) become increasingly integrated into daily life, the need for emotion-aware systems that can recognize and respond appropriately to user emotions has grown. While significant progress has been made in speech emotion recognition (SER) and sentiment analysis, effectively addressing user emotions-particularly negative ones-remains a challenge. This study explores human emotional response strategies in VA interactions using a role-swapping approach, where participants regulate AI emotions rather than receiving pre-programmed responses. Through speech feature analysis and natural language processing (NLP), we examined acoustic and linguistic patterns across various emotional scenarios. Results show that participants favor neutral or positive emotional responses when engaging with negative emotional cues, highlighting a natural tendency toward emotional regulation and de-escalation. Key acoustic indicators such as root mean square (RMS), zero-crossing rate (ZCR), and jitter were identified as sensitive to emotional states, while sentiment polarity and lexical diversity (TTR) distinguished between positive and negative responses. These findings provide valuable insights for developing adaptive, context-aware VAs capable of delivering empathetic, culturally sensitive, and user-aligned responses. By understanding how humans naturally regulate emotions in AI interactions, this research contributes to the design of more intuitive and emotionally intelligent voice assistants, enhancing user trust and engagement in human-AI interactions.

Problem

Research questions and friction points this paper is trying to address.

Emotion-aware voice assistants

Role-swapping emotional regulation

Speech feature and NLP analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Role-swapping approach for emotion regulation

Speech feature analysis for emotional states

NLP for empathetic voice assistant responses

🔎 Similar Papers

No similar papers found.