StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control

📅 2026-03-08

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

This work addresses the lack of systematic evaluation of speaking-style intensity control in spoken language models within multi-turn dialogues. To bridge this gap, we introduce StyleBench—the first benchmark specifically designed for evaluating style control in conversational speech synthesis. StyleBench comprises a multi-turn dialogue dataset annotated along four stylistic dimensions: emotion, speech rate, volume, and pitch, and incorporates a user-prompt-driven mechanism for fine-grained style intensity control. Through comprehensive stylistic annotations and automated evaluation metrics, StyleBench establishes a standardized framework that reveals a significant performance gap between current spoken language models and general-purpose large language models in terms of controllable style generation. This benchmark provides both a diagnostic tool and a foundation to guide future research in controllable and expressive spoken dialogue systems.

Technology Category

Application Category

📝 Abstract

Speech language models (SLMs) have significantly extended the interactive capability of text-based Large Language Models (LLMs) by incorporating paralinguistic information. For more realistic interactive experience with customized styles, current SLMs have managed to interpret and control speaking style intensity from user prompts during the dialogue process. However, there remains a lack of systematic benchmarks that quantifies and evaluates the style intensity control ability in conversations. In this paper, we propose StyleBench, a multi-turn dialogue benchmark for comprehensively evaluating the style intensity control ability across four dimensions: emotion, speed, volume, and pitch. Our results reveal the performance gaps between leading SLMs and omni language models (OLMs), suggesting the underlying reasons and promising approaches for future exploration.

Problem

Research questions and friction points this paper is trying to address.

speech language models

speaking style control

conversational benchmark

style intensity

paralinguistic information

Innovation

Methods, ideas, or system contributions that make the work stand out.

StyleBench

speech language models

style intensity control