🤖 AI Summary
Existing browser user agents (BUAs) operate in a single-step, instruction-driven manner, rendering them inadequate for complex, nonlinear browsing tasks involving ambiguous user goals, iterative decision-making, and dynamically evolving contextual information.
Method: This paper proposes a human–computer collaborative browser agent framework inspired by theories of human browsing behavior. It establishes an “action–feedback–reasoning” closed-loop architecture that explicitly distinguishes exploratory from exploitative actions, enabling progressive navigation and real-time policy adaptation. Crucially, it systematically integrates cognitive behavioral models into agent design and adopts a human-in-the-loop (HITL) architecture to support interaction-driven browsing.
Contribution/Results: Evaluated across diverse hypothetical use cases, the framework significantly reduces both user operational and cognitive load while enhancing process controllability and robustness in goal achievement. It advances browser agents from passive command executors toward proactive, adaptive collaborators.
📝 Abstract
Although browser-using agents (BUAs) show promise for web tasks and automation, most BUAs terminate after executing a single instruction, failing to support users' complex, nonlinear browsing with ambiguous goals, iterative decision-making, and changing contexts. We present a human-in-the-loop (HITL) conceptual framework informed by theories of human web browsing behavior. The framework centers on an iterative loop in which the BUA proactively proposes next actions and the user steers the browsing process through feedback. It also distinguishes between exploration and exploitation actions, enabling users to control the breadth and depth of their browsing. Consequently, the framework aims to reduce users' physical and cognitive effort while preserving users' traditional browsing mental model and supporting users in achieving satisfactory outcomes. We illustrate how the framework operates with hypothetical use cases and discuss the shift from manual browsing to interaction-driven browsing. We contribute a theoretically informed conceptual framework for BUAs.