ProAgentBench: Evaluating LLM Agents for Proactive Assistance with Real-World Data

📅 2026-02-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical limitation in existing proactive agent research, which predominantly relies on synthetic data and lacks contextual grounding in real-world user workflows, thereby hindering effective modeling of both intervention timing and content. To bridge this gap, the authors introduce ProAgentBench—the first benchmark for proactive agents based on authentic long-term user interactions. The framework decouples proactive assistance into two subtasks—timing prediction and assistive content generation—within a hierarchical task structure. A privacy-compliant dataset is constructed to preserve contextual integrity while capturing high-burst behavioral patterns (B=0.787). Experimental results demonstrate that models trained on real interaction data significantly outperform synthetic-data baselines, and that incorporating long-term context substantially improves intervention timing accuracy.

Technology Category

Application Category

📝 Abstract
Proactive agents that anticipate user intentions without explicit prompts represent a significant evolution in human-AI interaction, promising to reduce cognitive load and streamline workflows. However, existing datasets suffer from two critical deficiencies: (1) reliance on LLM-synthesized data that fails to capture authentic human decision-making patterns, and (2) focus on isolated tasks rather than continuous workflows, missing the pre-assistance behavioral context essential for learning proactive intervention signals. To address these gaps, we introduce ProAgentBench, a rigorous benchmark for proactive agents in working scenarios. Our contributions include: (1) a hierarchical task framework that decomposes proactive assistance into timing prediction and assist content generation; (2) a privacy-compliant dataset with 28,000+ events from 500+ hours of real user sessions, preserving bursty interaction patterns (burstiness B=0.787) absent in synthetic data; and (3) extensive experiments that evaluates LLM- and VLM-based baselines. Numerically, we showed that long-term memory and historical context significantly enhance prediction accuracy, while real-world training data substantially outperforms synthetic alternatives. We release our dataset and code at https://anonymous.4open.science/r/ProAgentBench-6BC0.
Problem

Research questions and friction points this paper is trying to address.

Proactive Agents
Long-term User Context
VLM Annotation
Privacy Protection
Human-Computer Interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Proactive Agents
Long-term User Context
VLM Annotation
Privacy Protection
Human-Computer Interaction
🔎 Similar Papers
No similar papers found.
Y
Yuanbo Tang
Tsinghua Shenzhen Graduate School, Tsinghua University, Shenzhen, China; FreeU Group (Open Collaborative AI Research Collective)
Huaze Tang
Huaze Tang
Tsinghua University, Tsinghua-Berkeley Shenzhen Institute
Multi-agent systemReinforcement learningSwarm robots
T
Tingyu Cao
Tsinghua Shenzhen Graduate School, Tsinghua University, Shenzhen, China; FreeU Group (Open Collaborative AI Research Collective)
Lam Nguyen
Lam Nguyen
Case Western Reserve University
large language modelsknowledge managementmultiagent systems
A
Anping Zhang
Tsinghua Shenzhen Graduate School, Tsinghua University, Shenzhen, China
X
Xinwen Cao
Tsinghua Shenzhen Graduate School, Tsinghua University, Shenzhen, China; FreeU Group (Open Collaborative AI Research Collective)
C
Chunkang Liu
Tsinghua Shenzhen Graduate School, Tsinghua University, Shenzhen, China; FreeU Group (Open Collaborative AI Research Collective)
Wenbo Ding
Wenbo Ding
UNIVERSITY AT BUFFALO
securityMachine Learning
Yang Li
Yang Li
Tsinghua Shenzhen International Graduate School
transfer learningtrustworthy AIrepresentation learningspatial algorithms