Efficient Agent Training for Computer Use

๐Ÿ“… 2025-05-20
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
To address the bottleneck of intelligent agent training for computer useโ€”its heavy reliance on large-scale, costly human demonstration dataโ€”this paper proposes a lightweight and efficient data synthesis and training paradigm. Leveraging only 312 human-annotated trajectories, we employ Claude 3.7 Sonnet to generate high-quality, diverse action decision sequences, establishing a holistic framework encompassing trajectory synthesis, action-space modeling, LLM-driven decision refinement, and cross-OS generalization training. We provide the first empirical validation that a minimal set of high-quality seed trajectories suffices to elicit strong generalization in desktop interaction capabilities. Our approach achieves a 141% relative performance gain on WindowsAgentArena-V2, substantially outperforming Claude 3.7 Sonnet (Extended Thinking). Moreover, it demonstrates superior cross-platform generalization on the OSWorld multi-operating-system benchmark.

Technology Category

Application Category

๐Ÿ“ Abstract
Scaling up high-quality trajectory data has long been a critical bottleneck for developing human-like computer use agents. We introduce PC Agent-E, an efficient agent training framework that significantly reduces reliance on large-scale human demonstrations. Starting with just 312 human-annotated computer use trajectories, we further improved data quality by synthesizing diverse action decisions with Claude 3.7 Sonnet. Trained on these enriched trajectories, our PC Agent-E model achieved a remarkable 141% relative improvement, surpassing the strong Claude 3.7 Sonnet with extended thinking on WindowsAgentArena-V2, an improved benchmark we also released. Furthermore, PC Agent-E demonstrates strong generalizability to different operating systems on OSWorld. Our findings suggest that strong computer use capabilities can be stimulated from a small amount of high-quality trajectory data.
Problem

Research questions and friction points this paper is trying to address.

Reducing reliance on large-scale human demonstrations for agent training
Improving data quality with synthesized diverse action decisions
Enhancing agent generalizability across different operating systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses small human-annotated trajectory dataset
Enhances data with Claude 3.7 Sonnet synthesis
Achieves high performance with minimal human data
๐Ÿ”Ž Similar Papers
No similar papers found.