🤖 AI Summary
This work addresses the severe and uneven loss of emotional information in discretized speech representations under high compression rates. To mitigate this issue, the authors analyze the impact of residual vector quantization (RVQ) on emotional content from both representational and task-oriented perspectives and propose Emo-Q, an emotion-aware quantization method. Emo-Q preserves emotional characteristics at low bitrates by constructing emotion-specific and emotion-biased codebooks alongside a lightweight routing mechanism. Experimental results demonstrate that the proposed approach significantly alleviates emotion degradation across diverse model architectures and emotion categories, leading to improved accuracy in emotion recognition tasks.
📝 Abstract
Modern speech systems increasingly use discretized self-supervised speech representations for compression and integration with token-based models, yet their impact on emotional information remains unclear. We study how residual vector quantization (RVQ) reshapes emotional information in discrete speech representations from both representation- and task-level perspectives. Our analysis shows that aggressive compression disproportionately degrades emotion, with uneven loss across emotion classes and model architectures. To address this, we introduce emotion-aware quantization using emotion-specific and emotion-biased codebooks, improving the preservation of both hard and soft emotion perception. We further propose Emo-Q, a lightweight routed quantization method that selects emotion-specialized codebooks, improving emotion recognition performance at lower bitrates. These results highlight the importance of emotion-aware discretization for robust affective speech processing.