Proactive Agent Research Environment: Simulating Active Users to Evaluate Proactive Assistants

πŸ“… 2026-04-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the lack of frameworks capable of realistically simulating user behavior in existing proactive agent research, which struggles to capture the stateful and temporal nature of interactions in digital environments. To this end, the paper introduces Pareβ€”a novel research environment that models applications as finite state machines, enabling state-aware user simulation and state-dependent action spaces. Building upon this foundation, the authors develop Pare-Bench, a comprehensive benchmark comprising 143 tasks across four application categories: communication, productivity, scheduling, and lifestyle. This study is the first to formalize applications as finite state machines to achieve high-fidelity user behavior simulation and provides a multi-scenario evaluation of agents’ capabilities in contextual observation, goal inference, timing of interventions, and cross-application coordination, offering an extensible experimental platform for future research on proactive digital assistants.
πŸ“ Abstract
Proactive agents that anticipate user needs and autonomously execute tasks hold great promise as digital assistants, yet the lack of realistic user simulation frameworks hinders their development. Existing approaches model apps as flat tool-calling APIs, failing to capture the stateful and sequential nature of user interaction in digital environments and making realistic user simulation infeasible. We introduce Proactive Agent Research Environment (Pare), a framework for building and evaluating proactive agents in digital environments. Pare models applications as finite state machines with stateful navigation and state-dependent action space for the user simulator, enabling active user simulation. Building on this foundation, we present Pare-Bench, a benchmark of 143 diverse tasks spanning communication, productivity, scheduling, and lifestyle apps, designed to test context observation, goal inference, intervention timing, and multi-app orchestration.
Problem

Research questions and friction points this paper is trying to address.

proactive agents
user simulation
stateful interaction
digital assistants
realistic evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

proactive agents
user simulation
finite state machines
digital assistants
benchmark
πŸ”Ž Similar Papers
No similar papers found.