StyleBench: Evaluating Speech Language Models on Conversational Speaking Style Control

๐Ÿ“… 2026-03-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the lack of systematic evaluation of speaking-style intensity control in spoken language models within multi-turn dialogues. To bridge this gap, we introduce StyleBenchโ€”the first benchmark specifically designed for evaluating style control in conversational speech synthesis. StyleBench comprises a multi-turn dialogue dataset annotated along four stylistic dimensions: emotion, speech rate, volume, and pitch, and incorporates a user-prompt-driven mechanism for fine-grained style intensity control. Through comprehensive stylistic annotations and automated evaluation metrics, StyleBench establishes a standardized framework that reveals a significant performance gap between current spoken language models and general-purpose large language models in terms of controllable style generation. This benchmark provides both a diagnostic tool and a foundation to guide future research in controllable and expressive spoken dialogue systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Speech language models (SLMs) have significantly extended the interactive capability of text-based Large Language Models (LLMs) by incorporating paralinguistic information. For more realistic interactive experience with customized styles, current SLMs have managed to interpret and control speaking style intensity from user prompts during the dialogue process. However, there remains a lack of systematic benchmarks that quantifies and evaluates the style intensity control ability in conversations. In this paper, we propose StyleBench, a multi-turn dialogue benchmark for comprehensively evaluating the style intensity control ability across four dimensions: emotion, speed, volume, and pitch. Our results reveal the performance gaps between leading SLMs and omni language models (OLMs), suggesting the underlying reasons and promising approaches for future exploration.
Problem

Research questions and friction points this paper is trying to address.

speech language models
speaking style control
conversational benchmark
style intensity
paralinguistic information
Innovation

Methods, ideas, or system contributions that make the work stand out.

StyleBench
speech language models
style intensity control
multi-turn dialogue
paralinguistic evaluation
๐Ÿ”Ž Similar Papers
No similar papers found.
H
Haishu Zhao
NLP Lab, School of Computer Science and Engineering, Northeastern University, Shenyang, China
A
Aokai Hao
NLP Lab, School of Computer Science and Engineering, Northeastern University, Shenyang, China
Yuan Ge
Yuan Ge
Northeastern University, China
ReasoningMultimodality LLMs
Z
Zhenqiang Hong
NLP Lab, School of Computer Science and Engineering, Northeastern University, Shenyang, China
Tong Xiao
Tong Xiao
Professor in Computer Science, Northeastern University, China
Natural Language ProcessingMachine TranslationLanguage Modeling
Jingbo Zhu
Jingbo Zhu
Northeastern University, China
Machine TranslationLanguage ParsingNatural Language Processing