ComputerRL: Scaling End-to-End Online Reinforcement Learning for Computer Use Agents

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses two key challenges in desktop automation: (1) the difficulty of autonomous agents operating effectively in human-centric GUI environments, and (2) the scalability limitations of end-to-end online reinforcement learning (RL). To bridge the semantic gap between high-level instructions and low-level GUI actions, we propose an API-GUI fused interaction paradigm; to mitigate entropy collapse in long-horizon RL, we introduce Entropulse—a novel entropy-regulated training strategy. Our approach employs a distributed architecture with large-scale parallel virtual desktop environments, alternating online RL and supervised fine-tuning. Evaluated on the OSWorld benchmark, the AutoGLM-OS-9B agent—built upon GLM-4-9B-0414—achieves 48.1% task accuracy, establishing a new state-of-the-art. This advancement significantly enhances the generality, robustness, and deployability of desktop automation agents.

Technology Category

Application Category

📝 Abstract
We introduce ComputerRL, a framework for autonomous desktop intelligence that enables agents to operate complex digital workspaces skillfully. ComputerRL features the API-GUI paradigm, which unifies programmatic API calls and direct GUI interaction to address the inherent mismatch between machine agents and human-centric desktop environments. Scaling end-to-end RL training is crucial for improvement and generalization across diverse desktop tasks, yet remains challenging due to environmental inefficiency and instability in extended training. To support scalable and robust training, we develop a distributed RL infrastructure capable of orchestrating thousands of parallel virtual desktop environments to accelerate large-scale online RL. Furthermore, we propose Entropulse, a training strategy that alternates reinforcement learning with supervised fine-tuning, effectively mitigating entropy collapse during extended training runs. We employ ComputerRL on open models GLM-4-9B-0414 and Qwen2.5-14B, and evaluate them on the OSWorld benchmark. The AutoGLM-OS-9B based on GLM-4-9B-0414 achieves a new state-of-the-art accuracy of 48.1%, demonstrating significant improvements for general agents in desktop automation. The algorithm and framework are adopted in building AutoGLM (Liu et al., 2024a)
Problem

Research questions and friction points this paper is trying to address.

Developing autonomous agents for complex desktop task automation
Addressing API-GUI mismatch in human-centric digital environments
Overcoming scalability challenges in online reinforcement learning training
Innovation

Methods, ideas, or system contributions that make the work stand out.

API-GUI paradigm unifying programmatic calls and GUI interactions
Distributed RL infrastructure orchestrating thousands of virtual desktops
Entropulse strategy alternating RL with supervised fine-tuning
🔎 Similar Papers
No similar papers found.
Hanyu Lai
Hanyu Lai
Tsinghua University
machine learningnatural language processing
X
Xiao Liu
Tsinghua University, Zhipu AI
Y
Yanxiao Zhao
University of Chinese Academy of Sciences
H
Han Xu
Zhipu AI
H
Hanchen Zhang
Tsinghua University
B
Bohao Jing
Zhipu AI
Yanyu Ren
Yanyu Ren
Tsinghua University
ML SystemsAI for NetworkCS Education
S
Shuntian Yao
Tsinghua University
Yuxiao Dong
Yuxiao Dong
CS, Tsinghua University
Large Language ModelsVision Language ModelsLLM ReasoningLLM AgentGraph Machine Learning
Jie Tang
Jie Tang
UW Madison
Computed Tomography