Style Amnesia: Investigating Speaking Style Degradation and Mitigation in Multi-Turn Spoken Language Models

📅 2025-12-29

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This study identifies and empirically characterizes “style amnesia” in spoken language models (SLMs)—a phenomenon wherein models recall initial paralinguistic style directives (e.g., emotion, accent, speaking rate) but fail to maintain consistent stylistic execution across multi-turn dialogues. To quantify this degradation, we introduce a novel multi-turn style controllability evaluation framework, incorporating instruction recall triggers and contrastive prompting experiments (system-level vs. user-level prompts). We systematically assess five state-of-the-art SLMs. Results reveal pervasive style inconsistency across all models; explicit user-side recall prompts improve style consistency by up to 27%, whereas system-level style instructions yield negligible control (<8% success rate). These findings challenge prevailing prompt-engineering paradigms and establish a new benchmark for evaluating and advancing controllable speech generation.

Technology Category

Application Category

📝 Abstract

In this paper, we show that when spoken language models (SLMs) are instructed to speak in a specific speaking style at the beginning of a multi-turn conversation, they cannot maintain the required speaking styles after several turns of interaction; we refer to this as the style amnesia of SLMs. We focus on paralinguistic speaking styles, including emotion, accent, volume, and speaking speed. We evaluate three proprietary and two open-source SLMs, demonstrating that none of these models can maintain a consistent speaking style when instructed to do so. We further show that when SLMs are asked to recall the style instruction in later turns, they can recall the style instruction, but they fail to express it throughout the conversation. We also show that explicitly asking the model to recall the style instruction can partially mitigate style amnesia. In addition, we examine various prompting strategies and find that SLMs struggle to follow the required style when the instruction is placed in system messages rather than user messages, which contradicts the intended function of system prompts.

Problem

Research questions and friction points this paper is trying to address.

Investigates style degradation in multi-turn spoken language models

Evaluates models' failure to maintain paralinguistic speaking styles consistently

Explores mitigation strategies for style amnesia in conversations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Explicitly recalling style instructions mitigates degradation

System messages less effective than user messages for style maintenance

Models recall instructions but fail to express styles consistently

🔎 Similar Papers

No similar papers found.