Comparing Human Oversight Strategies for Computer-Use Agents

📅 2026-04-06

📈 Citations: 0

✨ Influential: 0

career value

240K/year

🤖 AI Summary

This study investigates how to design human supervision strategies for large language model–driven computer use agents (CUAs) to balance automation and human intervention. Framing CUA supervision as a coordination problem defined by delegation structure and engagement level, the authors propose a structural supervision framework and evaluate four distinct strategies in real-world web environments through a mixed-methods approach combining live web experiments, user behavior analysis, and qualitative interviews. Findings indicate that proactive supervision significantly reduces the incidence of agent problematic behaviors but does not concurrently improve runtime intervention success. Moreover, supervision strategies exert a stronger influence on the exposure of problematic behaviors than on post-hoc correction, and the identifiability of critical decision moments is central to effective intervention. The research reveals no universally optimal strategy, with subjective factors such as trust being highly context-dependent.

Technology Category

Application Category

📝 Abstract

LLM-powered computer-use agents (CUAs) are shifting users from direct manipulation to supervisory coordination. Existing oversight mechanisms, however, have largely been studied as isolated interface features, making broader oversight strategies difficult to compare. We conceptualize CUA oversight as a structural coordination problem defined by delegation structure and engagement level, and use this lens to compare four oversight strategies in a mixed-methods study with 48 participants in a live web environment. Our results show that oversight strategy more reliably shaped users' exposure to problematic actions than their ability to correct them once visible. Plan-based strategies were associated with lower rates of agent problematic-action occurrence, but not equally strong gains in runtime intervention success once such actions became visible. On subjective measures, no single strategy was uniformly best, and the clearest context-sensitive differences appeared in trust. Qualitative findings further suggest that intervention depended not only on what controls users retained, but on whether risky moments became legible as requiring judgment during execution. These findings suggest that effective CUA oversight is not achieved by maximizing human involvement alone. Instead, it depends on how supervision is structured to surface decision-critical moments and support their recognition in time for meaningful intervention.

Problem

Research questions and friction points this paper is trying to address.

human oversight

computer-use agents

LLM-powered agents

supervisory coordination

delegation structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

human oversight

computer-use agents

delegation structure