🤖 AI Summary
This work addresses the limitations of small-scale agents, which exhibit suboptimal performance and uneven failure patterns in specialized software domains, while large expert models incur prohibitive deployment costs. To bridge this gap, the authors propose LearnWeak, a novel framework that leverages a strong reference agent to automatically identify weaknesses in a student agent, synthesize targeted tasks, and generate unlabeled supervisory signals. By decoupling planning and execution errors into distinct training objectives, LearnWeak enables efficient domain adaptation without manual annotation. The approach introduces the first student-aware mechanism for automated data generation and behavior correction, overcoming the constraints of uniform supervision. Evaluated on eight domains in OSWorld, LearnWeak outperforms EvoCUA-8B and OpenCUA-7B by average margins of 11.6 and 11.1 percentage points, respectively, substantially surpassing existing baselines in autonomous trajectory generation and training.
📝 Abstract
Computer-use agents (CUAs) have recently made substantial progress, but deploying a separate large expert for each software domain remains expensive. Small open computer-use agents are more practical specialization targets, but they remain substantially weaker and exhibit uneven domain-specific failures. A straightforward remedy is to synthesize large-scale training data for the target domain, yet we find that this naive approach yields only marginal improvements. Building on this observation, we introduce LearnWeak, an annotation-free specialization framework for small computer-use agents that uses a stronger reference agent to identify the student's weaknesses in the target domain, synthesize targeted tasks, and construct supervision automatically. LearnWeak further introduces an error-aware specialization objective that disentangles planning and execution errors, enabling more behaviorally precise updates than broad uniform supervision. On OSWorld, LearnWeak achieves average gains of 11.6 and 11.1 percentage points over EvoCUA-8B and OpenCUA-7B, respectively, across eight domains. We also validate that our student-aware dataset generation and training approaches outperform existing autonomous trajectory generation and training baselines. Our work highlights the importance of student awareness in both data synthesis and agent training, pointing toward a more principled and efficient path for specializing small computer-use agents in diverse domains.