Towards Enterprise-Ready Computer Using Generalist Agent

📅 2025-02-24

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

Enterprise deployment of general-purpose agents faces challenges including poor robustness, high maintenance overhead, and low practicality. Method: This paper proposes a progressive evolution paradigm tailored for enterprise adoption, integrating systematic failure attribution analysis, a lightweight iterative optimization framework, and a multi-round feedback-driven policy fine-tuning mechanism. Built upon state-of-the-art general agent architectures, it realizes an enterprise-grade Computer-Use General Agent (CUGA). Automated evaluation via benchmark toolchains—e.g., WebArena—enables closed-loop iteration to enhance task completion rates and execution stability. Contribution/Results: Experimental results demonstrate that CUGA achieves new state-of-the-art performance on WebArena, validating the feasibility of a low-cost, highly reliable, and maintainable enterprise agent system.

Technology Category

Application Category

📝 Abstract

This paper presents our ongoing work toward developing an enterprise-ready Computer Using Generalist Agent (CUGA) system. Our research highlights the evolutionary nature of building agentic systems suitable for enterprise environments. By integrating state-of-the-art agentic AI techniques with a systematic approach to iterative evaluation, analysis, and refinement, we have achieved rapid and cost-effective performance gains, notably reaching a new state-of-the-art performance on the WebArena benchmark. We detail our development roadmap, the methodology and tools that facilitated rapid learning from failures and continuous system refinement, and discuss key lessons learned and future challenges for enterprise adoption.

Problem

Research questions and friction points this paper is trying to address.

Developing enterprise-ready Computer Using Generalist Agent (CUGA) system.

Integrating AI techniques for iterative evaluation and refinement.

Achieving state-of-the-art performance on WebArena benchmark.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates state-of-the-art agentic AI techniques

Uses iterative evaluation for rapid performance gains

Achieves new benchmark performance on WebArena

🔎 Similar Papers

Large Model Based Agents: State-of-the-Art, Cooperation Paradigms, Security and Privacy, and Future Trends