SEA: Self-Evolution Agent with Step-wise Reward for Computer Use

📅 2025-08-05

📈 Citations: 0

✨ Influential: 0

career value

243K/year

🤖 AI Summary

Current computer-using agents (CUAs) underperform on complex, long-horizon human-computer interaction tasks, primarily constrained by the capability bottlenecks of small-scale models. To address this, we propose Self-Evolution Agent (SEA), a lightweight CUA built upon a 7B-parameter foundation with intrinsic self-evolution capability. Methodologically: (1) we design a verifiable automated trajectory generation pipeline; (2) we introduce a stepwise reinforcement learning framework integrating programmatic data synthesis and fine-grained reward shaping; and (3) we propose a training-free localization-planning capability fusion mechanism for enhanced reasoning. Experiments demonstrate that SEA significantly outperforms comparably sized baselines and approaches the performance of substantially larger models, achieving strong generalization and practical utility on multi-step GUI navigation tasks. To foster reproducibility and community advancement, we will publicly release both the model and code.

Technology Category

Application Category

📝 Abstract

Computer use agent is an emerging area in artificial intelligence that aims to operate the computers to achieve the user's tasks, which attracts a lot of attention from both industry and academia. However, the present agents' performance is far from being used. In this paper, we propose the Self-Evolution Agent (SEA) for computer use, and to develop this agent, we propose creative methods in data generation, reinforcement learning, and model enhancement. Specifically, we first propose an automatic pipeline to generate the verifiable trajectory for training. And then, we propose efficient step-wise reinforcement learning to alleviate the significant computational requirements for long-horizon training. In the end, we propose the enhancement method to merge the grounding and planning ability into one model without any extra training. Accordingly, based on our proposed innovation of data generation, training strategy, and enhancement, we get the Selfevolution Agent (SEA) for computer use with only 7B parameters, which outperforms models with the same number of parameters and has comparable performance to larger ones. We will make the models' weight and related codes open-source in the future.

Problem

Research questions and friction points this paper is trying to address.

Improving computer use agent performance with limited parameters

Developing efficient step-wise reinforcement learning for long-horizon tasks

Merging grounding and planning abilities without extra training

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatic pipeline for verifiable trajectory generation

Efficient step-wise reinforcement learning

Enhancement merging grounding and planning

🔎 Similar Papers

A Survey on Self-play Methods in Reinforcement Learning