EE-MCP: Self-Evolving MCP-GUI Agents via Automated Environment Generation and Experience Learning

📅 2026-04-10

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the lack of systematic modeling of multimodal coordination in existing agents that integrate GUI interaction with structured MCP API calls, which hinders autonomous, cross-application continuous optimization. The authors formalize MCP-GUI coordination as a unified hybrid policy learning problem and propose a fully automated self-evolution framework that enables iterative improvement without human intervention through automatic environment generation, gap-driven task synthesis, trajectory distillation, and quality-filtered training. They innovatively introduce an experience bank mechanism that extracts implicit rules from large language models via trajectory comparison, enhancing inference-time performance without fine-tuning and revealing the alignment between task types and optimization strategies. In cross-domain evaluations across three desktop applications, the distillation approach achieves a 77.8% pass rate (+17.8 percentage points) on MCP-dominated tasks, while the experience bank improves performance by 10.0 percentage points on GUI-intensive tasks.

Technology Category

Application Category

📝 Abstract

Computer-use agents that combine GUI interaction with structured API calls via the Model Context Protocol (MCP) show promise for automating software tasks. However, existing approaches lack a principled understanding of how agents should balance these two modalities and how to enable iterative self-improvement across diverse applications. We formulate MCP-GUI interplay as a unified hybrid policy learning problem where the agent learns when each modality provides complementary advantages, and show that distillation and experience augmentation target fundamentally different failure modes - requiring application-aware mechanism selection. Built on this formulation, we propose a self-evolving framework with a fully automatic pipeline that orchestrates automatic environment generation and validation, trajectory collection, gap-driven task synthesis, and quality-filtered training - all without manual intervention. A key innovation is our experience bank, which accumulates LLM-learned rules from trajectory comparison, enabling inference-time improvement without fine-tuning. Systematic \textbf{cross-application analysis} across three desktop applications reveals that the optimal strategy depends on MCP-GUI composition: distillation achieves 77.8\% pass rate on MCP-dominant tasks (+17.8pp), while the experience bank excels on GUI-intensive tasks (+10.0pp).

Problem

Research questions and friction points this paper is trying to address.

MCP-GUI interplay

hybrid policy learning

self-improvement

computer-use agents

modality balancing

Innovation

Methods, ideas, or system contributions that make the work stand out.

MCP-GUI interplay

self-evolving agents

experience bank