OSExpert: Computer-Use Agents Learning Professional Skills via Exploration

πŸ“… 2026-03-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Current general-purpose computer-using agents struggle to match human experts in complex tasks due to inefficiency, poor generalization, and a lack of fine-grained operational capabilities. To address these limitations, this work proposes a GUI-based depth-first search (GUI-DFS) exploration mechanism that enables agents to autonomously verify the functionality of interface elements and construct curricula for composite tasks through composable skills. The approach further incorporates a primitive action database to accumulate reusable skills and integrates runtime expansion optimization with procedural knowledge injection, facilitating efficient decision-making within perceived capability boundaries. Evaluated on the OSExpert-Eval benchmark, the method achieves approximately a 20% performance improvement and reduces the gap in task execution efficiency with human experts by about 80%, significantly advancing agents’ proficiency toward expert-level computer operation.

Technology Category

Application Category

πŸ“ Abstract
General-purpose computer-use agents have shown impressive performance across diverse digital environments. However, our new benchmark, OSExpert-Eval, indicates they remain far less helpful than human experts. Although inference-time scaling enables adaptation, these agents complete complex tasks inefficiently with degraded performance, transfer poorly to unseen UIs, and struggle with fine-grained action sequences. To solve the problem, we introduce a GUI-based depth-first search (GUI-DFS) exploration algorithm to comprehensively explore and verify an environment's unit functions. The agent then exploits compositionality between unit skills to self-construct a curriculum for composite tasks. To support fine-grained actions, we curate a database of action primitives for agents to discover during exploration; these are saved as a skill set once the exploration is complete. We use the learned skills to improve the agent's performance and efficiency by (1) enriching agents with ready-to-use procedural knowledge, allowing them to plan only once for long trajectories and generate accurate actions, and (2) enabling them to end inference-time scaling earlier by realizing their boundary of capabilities. Extensive experiments show that our environment-learned agent takes a meaningful step toward expert-level computer use, achieving a around 20 percent performance gain on OSExpert-Eval and closing the efficiency gap to humans by around 80 percent
Problem

Research questions and friction points this paper is trying to address.

computer-use agents
complex tasks
unseen UIs
fine-grained actions
expert-level performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

GUI-based DFS
skill compositionality
action primitives
curriculum self-construction
inference-time scaling
πŸ”Ž Similar Papers
No similar papers found.