AffectCodec: Emotion-Preserving Neural Speech Codec for Expressive Speech Modeling

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

205K/year
🤖 AI Summary
This work addresses the degradation of emotional cues in existing neural speech codecs during quantization, which often compromises the balance among semantic fidelity, prosodic naturalness, and emotional expressiveness. To mitigate this issue, the authors propose an emotion-guided end-to-end neural codec framework that explicitly preserves emotionally salient features in compressed representations through three key mechanisms: emotion–semantic guided latent modulation, relation-preserving emotion–semantic knowledge distillation, and emotion-weighted semantic alignment. Experimental results demonstrate that the proposed approach significantly enhances emotional consistency and perceptual quality in speech reconstruction, emotion recognition, and downstream text-to-speech synthesis, while maintaining high content accuracy.
📝 Abstract
Neural speech codecs provide discrete representations for speech language models, but emotional cues are often degraded during quantization. Existing codecs mainly optimize acoustic reconstruction, leaving emotion expressiveness insufficiently modeled at the representation level. We propose an emotion-guided neural speech codec that explicitly preserves emotional information while maintaining semantic fidelity and prosodic naturalness. Our framework combines emotion-semantic guided latent modulation, relation-preserving emotional-semantic distillation, and emotion-weighted semantic alignment to retain emotionally salient cues under compression. Extensive evaluations across speech reconstruction, emotion recognition, and downstream text-to-speech generation demonstrate improved emotion consistency and perceptual quality without sacrificing content accuracy.
Problem

Research questions and friction points this paper is trying to address.

neural speech codec
emotion preservation
expressive speech
emotional cues
speech representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

emotion-preserving codec
neural speech codec
emotional-semantic distillation
latent modulation
expressive speech modeling
🔎 Similar Papers
No similar papers found.