🤖 AI Summary
This study addresses the challenge of jointly modeling emotional expressiveness and prosodic prominence in expressive speech synthesis—specifically, how to enhance emotional intensity while preserving the clarity and stability of target prominence across diverse emotional contexts. To this end, we propose EME-TTS, a novel framework introducing a weakly supervised prominence modeling paradigm that integrates LLM-generated prominence pseudo-labels with variance-driven emphasis feature extraction. We further design an Emphasis-Prominence Enhancement (EPE) module that explicitly couples emotional representations with prominence positions. Experimental results demonstrate significant improvements in prominence discriminability, emotional naturalness, and overall perceptual quality across multiple emotion conditions. Notably, EME-TTS achieves the first stable and controllable modeling of emotion–prominence interaction, establishing a new paradigm for highly expressive text-to-speech synthesis.
📝 Abstract
In recent years, emotional Text-to-Speech (TTS) synthesis and emphasis-controllable speech synthesis have advanced significantly. However, their interaction remains underexplored. We propose Emphasis Meets Emotion TTS (EME-TTS), a novel framework designed to address two key research questions: (1) how to effectively utilize emphasis to enhance the expressiveness of emotional speech, and (2) how to maintain the perceptual clarity and stability of target emphasis across different emotions. EME-TTS employs weakly supervised learning with emphasis pseudo-labels and variance-based emphasis features. Additionally, the proposed Emphasis Perception Enhancement (EPE) block enhances the interaction between emotional signals and emphasis positions. Experimental results show that EME-TTS, when combined with large language models for emphasis position prediction, enables more natural emotional speech synthesis while preserving stable and distinguishable target emphasis across emotions. Synthesized samples are available on-line.