Efficient Agent Training for Computer Use

📅 2025-05-20

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

To address the bottleneck of intelligent agent training for computer use—its heavy reliance on large-scale, costly human demonstration data—this paper proposes a lightweight and efficient data synthesis and training paradigm. Leveraging only 312 human-annotated trajectories, we employ Claude 3.7 Sonnet to generate high-quality, diverse action decision sequences, establishing a holistic framework encompassing trajectory synthesis, action-space modeling, LLM-driven decision refinement, and cross-OS generalization training. We provide the first empirical validation that a minimal set of high-quality seed trajectories suffices to elicit strong generalization in desktop interaction capabilities. Our approach achieves a 141% relative performance gain on WindowsAgentArena-V2, substantially outperforming Claude 3.7 Sonnet (Extended Thinking). Moreover, it demonstrates superior cross-platform generalization on the OSWorld multi-operating-system benchmark.

Technology Category

Application Category

📝 Abstract

Scaling up high-quality trajectory data has long been a critical bottleneck for developing human-like computer use agents. We introduce PC Agent-E, an efficient agent training framework that significantly reduces reliance on large-scale human demonstrations. Starting with just 312 human-annotated computer use trajectories, we further improved data quality by synthesizing diverse action decisions with Claude 3.7 Sonnet. Trained on these enriched trajectories, our PC Agent-E model achieved a remarkable 141% relative improvement, surpassing the strong Claude 3.7 Sonnet with extended thinking on WindowsAgentArena-V2, an improved benchmark we also released. Furthermore, PC Agent-E demonstrates strong generalizability to different operating systems on OSWorld. Our findings suggest that strong computer use capabilities can be stimulated from a small amount of high-quality trajectory data.

Problem

Research questions and friction points this paper is trying to address.

Reducing reliance on large-scale human demonstrations for agent training

Improving data quality with synthesized diverse action decisions

Enhancing agent generalizability across different operating systems

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses small human-annotated trajectory dataset

Enhances data with Claude 3.7 Sonnet synthesis

Achieves high performance with minimal human data

🔎 Similar Papers

No similar papers found.