CUA-Skill: Develop Skills for Computer Using Agent

📅 2026-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing computer-using agents struggle to scale and significantly underperform humans due to the absence of reusable, structured skill abstractions. This work proposes the first structured skill abstraction framework tailored for graphical user interface (GUI) interaction, introducing CUA-Skill—a large-scale skill library that encodes human operational knowledge into reusable skill units represented by parameterized execution graphs and composition graphs. Building upon this framework, an end-to-end agent is developed, capable of dynamic skill retrieval, parameter instantiation, and memory-aware failure recovery. Evaluated on the WindowsAgentArena benchmark, the approach achieves a success rate of 57.5%, substantially outperforming existing methods while enhancing both execution efficiency and robustness, thereby establishing foundational infrastructure for general-purpose computer-using agents.

Technology Category

Application Category

📝 Abstract
Computer-Using Agents (CUAs) aim to autonomously operate computer systems to complete real-world tasks. However, existing agentic systems remain difficult to scale and lag behind human performance. A key limitation is the absence of reusable and structured skill abstractions that capture how humans interact with graphical user interfaces and how to leverage these skills. We introduce CUA-Skill, a computer-using agentic skill base that encodes human computer-use knowledge as skills coupled with parameterized execution and composition graphs. CUA-Skill is a large-scale library of carefully engineered skills spanning common Windows applications, serving as a practical infrastructure and tool substrate for scalable, reliable agent development. Built upon this skill base, we construct CUA-Skill Agent, an end-to-end computer-using agent that supports dynamic skill retrieval, argument instantiation, and memory-aware failure recovery. Our results demonstrate that CUA-Skill substantially improves execution success rates and robustness on challenging end-to-end agent benchmarks, establishing a strong foundation for future computer-using agent development. On WindowsAgentArena, CUA-Skill Agent achieves state-of-the-art 57.5% (best of three) successful rate while being significantly more efficient than prior and concurrent approaches. The project page is available at https://microsoft.github.io/cua_skill/.
Problem

Research questions and friction points this paper is trying to address.

Computer-Using Agents
skill abstraction
graphical user interfaces
agent scalability
human-computer interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

skill abstraction
computer-using agent
parameterized execution
composition graph
memory-aware recovery
🔎 Similar Papers
No similar papers found.