From Reactive to Proactive: Assessing the Proactivity of Voice Agents via ProVoice-Bench

📅 2026-04-16

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing benchmarks for voice-based agents primarily focus on passive responsiveness, failing to adequately assess their capacity for proactive intervention and monitoring. To address this gap, this work proposes ProVoice-Bench, the first evaluation framework specifically designed for proactive voice agents. The framework introduces four novel task categories that simulate active interaction scenarios and employs a multi-stage data synthesis pipeline to construct a high-quality benchmark dataset comprising 1,182 samples. Systematic evaluation of state-of-the-art multimodal large language models using this framework reveals significant deficiencies in current models’ ability to trigger actions appropriately and reason contextually, particularly manifesting as excessive triggering and logical inconsistencies. These findings highlight critical challenges in developing robust proactive voice interaction capabilities.

Technology Category

Application Category

📝 Abstract

Recent advancements in LLM agents are gradually shifting from reactive, text-based paradigms toward proactive, multimodal interaction. However, existing benchmarks primarily focus on reactive responses, overlooking the complexities of proactive intervention and monitoring. To bridge this gap, we introduce ProVoice-Bench, the first evaluation framework specifically designed for proactive voice agents, featuring four novel tasks. By leveraging a multi-stage data synthesis pipeline, we curate 1,182 high-quality samples for rigorous testing. Our evaluation of state-of-the-art Multimodal LLMs reveals a significant performance gap, particularly regarding over-triggering and reasoning capabilities. These findings highlight the limitations of current models and offer a roadmap for developing more natural, context-aware proactive agents.

Problem

Research questions and friction points this paper is trying to address.

proactive voice agents

evaluation benchmark

multimodal interaction

proactivity assessment

LLM agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

proactive voice agents

ProVoice-Bench

multimodal LLMs