Full-Duplex-Bench: A Benchmark to Evaluate Full-duplex Spoken Dialogue Models on Turn-taking Capabilities

๐Ÿ“… 2025-03-06
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Current evaluation methods for full-duplex spoken dialogue models are limited to turn-level metrics or coarse-grained corpus statistics, lacking systematic assessment of critical real-time interactive behaviorsโ€”such as pause handling, feedback generation, turn-taking, and interruption management. To address this gap, we propose FullDuplexBench, the first standardized benchmark specifically designed for full-duplex spoken dialogue models. It formally defines and quantifies four dynamic dialogue behaviors. Our end-to-end automated evaluation framework integrates speech temporal alignment, acoustic event detection, and dialogue-act annotation to enable fine-grained, reproducible measurement of millisecond-level response latency, natural backchannel feedback, and robust interruption recovery. FullDuplexBench is publicly released and has already facilitated iterative optimization of multiple state-of-the-art models.

Technology Category

Application Category

๐Ÿ“ Abstract
Spoken dialogue modeling introduces unique challenges beyond text-based language modeling, demanding robust turn-taking, backchanneling, and real-time interaction. Although most Spoken Dialogue Models (SDMs) rely on half-duplex processing (handling speech one turn at a time), emerging full-duplex SDMs can listen and speak simultaneously, enabling more natural and engaging conversations. However, current evaluations of such models remain limited, often focusing on turn-based metrics or high-level corpus analyses (e.g., turn gaps, pauses). To address this gap, we present Full-Duplex-Bench, a new benchmark that systematically evaluates key conversational behaviors: pause handling, backchanneling, turn-taking, and interruption management. Our framework uses automatic metrics for consistent and reproducible assessments of SDMs' interactive performance. By offering an open and standardized evaluation benchmark, we aim to advance spoken dialogue modeling and encourage the development of more interactive and natural dialogue systems.
Problem

Research questions and friction points this paper is trying to address.

Evaluates full-duplex spoken dialogue models on turn-taking capabilities.
Addresses limitations in current evaluations of conversational behaviors.
Provides a standardized benchmark for interactive dialogue system performance.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Full-Duplex-Bench for dialogue evaluation
Assesses pause handling, backchanneling, turn-taking, interruptions
Uses automatic metrics for reproducible SDM assessments
๐Ÿ”Ž Similar Papers
No similar papers found.