From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench

πŸ“… 2026-04-16
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

204K/year
πŸ€– AI Summary
Existing benchmarks for voice-based agents primarily focus on passive responsiveness, failing to adequately assess their capacity for proactive intervention and monitoring. To address this gap, this work proposes ProVoice-Bench, the first evaluation framework specifically designed for proactive voice agents. The framework introduces four novel task categories that simulate active interaction scenarios and employs a multi-stage data synthesis pipeline to construct a high-quality benchmark dataset comprising 1,182 samples. Systematic evaluation of state-of-the-art multimodal large language models using this framework reveals significant deficiencies in current models’ ability to trigger actions appropriately and reason contextually, particularly manifesting as excessive triggering and logical inconsistencies. These findings highlight critical challenges in developing robust proactive voice interaction capabilities.

Technology Category

Application Category

πŸ“ Abstract
Recent advancements in LLM agents are gradually shifting from reactive, text-based paradigms toward proactive, multimodal interaction. However, existing benchmarks primarily focus on reactive responses, overlooking the complexities of proactive intervention and monitoring. To bridge this gap, we introduce ProVoice-Bench, the first evaluation framework specifically designed for proactive voice agents, featuring four novel tasks. By leveraging a multi-stage data synthesis pipeline, we curate 1,182 high-quality samples for rigorous testing. Our evaluation of state-of-the-art Multimodal LLMs reveals a significant performance gap, particularly regarding over-triggering and reasoning capabilities. These findings highlight the limitations of current models and offer a roadmap for developing more natural, context-aware proactive agents.
Problem

Research questions and friction points this paper is trying to address.

proactive voice agents
evaluation benchmark
multimodal interaction
proactivity assessment
LLM agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

proactive voice agents
ProVoice-Bench
multimodal LLMs
benchmarking
data synthesis
πŸ”Ž Similar Papers
No similar papers found.