🤖 AI Summary
To address the high resource consumption and economic cost associated with training intelligent agents on large-scale operating system (OS) environments, this paper introduces OSGym—a highly scalable distributed data engine. Built upon lightweight virtualization and dynamic environment orchestration, OSGym supports parallel execution of up to one thousand heterogeneous OS instances, enabling unified supervised fine-tuning and reinforcement learning across diverse tasks—including tool invocation, browser interaction, and software engineering. Its multi-agent parallel training framework generates 1,420 high-quality, multi-turn trajectories per minute. Experiments demonstrate that the generated data significantly improves model generalization, achieving state-of-the-art performance on multiple benchmarks. The project is open-sourced, providing an infrastructure-level solution for large-scale, low-cost, and reproducible training of general-purpose computer agents.
📝 Abstract
We introduce OSGym, a super-scalable distributed data engine for training agents across diverse computer-related tasks. OSGym efficiently scales to over a thousand operating system (OS) replicas at an academia-affordable cost, serving as dynamic runtime environments for intelligent agents. It offers three key advantages. (1) Scalability: Despite the intensive resource requirements of running multiple OS replicas, OSGym parallelizes over a thousand instances while maintaining operational efficiency under constrained resources, generating up to 1420 multi-turn trajectories per minute. (2) Generality and Customizability: OSGym supports a broad spectrum of tasks that run on OS platforms, including tool use, browser interactions, software engineering, and office applications, with flexible support for diverse model training algorithms. (3) Economic Viability: OSGym operates at only 0.2-0.3 USD per day per OS replica using accessible on-demand compute providers. It is fully open-source and freely available for both research and commercial use. Experiments show that OSGym enables comprehensive data collection, supervised fine-tuning, and reinforcement learning pipelines for computer agents. Models trained with OSGym outperform state-of-the-art baselines, demonstrating its potential to advance scalability and universality in future agent research.