🤖 AI Summary
Existing LLM-based agents predominantly rely on explicit user instructions, imposing high operational and cognitive burdens. This work introduces the first end-to-end proactive LLM agent system that enables continuous contextual understanding and autonomous service initiation by jointly modeling multi-level environmental signals—visual, auditory, and behavioral—and user-specific cues. Methodologically, it proposes: (1) an on-demand hierarchical perception mechanism supporting a lightweight edge-cloud collaborative architecture; and (2) a context-aware proactive reasoning framework for real-time intent prediction and tool invocation from heterogeneous sensory inputs. Evaluated in real-world scenarios, the system achieves a 33.4% improvement in proactive intent prediction accuracy and a 16.8% gain in tool invocation F1-score, while significantly enhancing user satisfaction. This marks the first paradigm shift of LLM agents from reactive response to autonomous, proactive service delivery.
📝 Abstract
Large Language Model (LLM) agents are emerging to transform daily life. However, existing LLM agents primarily follow a reactive paradigm, relying on explicit user instructions to initiate services, which increases both physical and cognitive workload. In this paper, we propose ProAgent, the first end-to-end proactive agent system that harnesses massive sensory contexts and LLM reasoning to deliver proactive assistance. ProAgent first employs a proactive-oriented context extraction approach with on-demand tiered perception to continuously sense the environment and derive hierarchical contexts that incorporate both sensory and persona cues. ProAgent then adopts a context-aware proactive reasoner to map these contexts to user needs and tool calls, providing proactive assistance. We implement ProAgent on Augmented Reality (AR) glasses with an edge server and extensively evaluate it on a real-world testbed, a public dataset, and through a user study. Results show that ProAgent achieves up to 33.4% higher proactive prediction accuracy, 16.8% higher tool-calling F1 score, and notable improvements in user satisfaction over state-of-the-art baselines, marking a significant step toward proactive assistants. A video demonstration of ProAgent is available at https://youtu.be/pRXZuzvrcVs.