🤖 AI Summary
Enterprise deployment of general-purpose agents faces challenges including poor robustness, high maintenance overhead, and low practicality. Method: This paper proposes a progressive evolution paradigm tailored for enterprise adoption, integrating systematic failure attribution analysis, a lightweight iterative optimization framework, and a multi-round feedback-driven policy fine-tuning mechanism. Built upon state-of-the-art general agent architectures, it realizes an enterprise-grade Computer-Use General Agent (CUGA). Automated evaluation via benchmark toolchains—e.g., WebArena—enables closed-loop iteration to enhance task completion rates and execution stability. Contribution/Results: Experimental results demonstrate that CUGA achieves new state-of-the-art performance on WebArena, validating the feasibility of a low-cost, highly reliable, and maintainable enterprise agent system.
📝 Abstract
This paper presents our ongoing work toward developing an enterprise-ready Computer Using Generalist Agent (CUGA) system. Our research highlights the evolutionary nature of building agentic systems suitable for enterprise environments. By integrating state-of-the-art agentic AI techniques with a systematic approach to iterative evaluation, analysis, and refinement, we have achieved rapid and cost-effective performance gains, notably reaching a new state-of-the-art performance on the WebArena benchmark. We detail our development roadmap, the methodology and tools that facilitated rapid learning from failures and continuous system refinement, and discuss key lessons learned and future challenges for enterprise adoption.