GUI-GENESIS: Automated Synthesis of Efficient Environments with Verifiable Rewards for GUI Agent Post-Training

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

career value

202K/year

🤖 AI Summary

This work addresses key challenges in training real-world GUI agents—namely high latency, low reproducibility, and unverifiable rewards stemming from reliance on visual proxies. The authors propose a novel approach leveraging a multimodal code model to automatically refactor native GUI applications into lightweight web-based environments. Crucially, they introduce, for the first time, code-native executable assertions as a verifiable reward mechanism. This framework substantially improves both training efficiency and verifiability: environment latency is reduced by an order of magnitude, and per-run training costs decrease by over $28,000. Empirical results demonstrate that the agent outperforms baseline models by 14.54% on real-world tasks and surpasses real-world reinforcement learning baselines by 3.27%.

Technology Category

Application Category

📝 Abstract

Post-training GUI agents in interactive environments is critical for developing generalization and long-horizon planning capabilities. However, training on real-world applications is hindered by high latency, poor reproducibility, and unverifiable rewards relying on noisy visual proxies. To address the limitations, we present GUI-GENESIS, the first framework to automatically synthesize efficient GUI training environments with verifiable rewards. GUI-GENESIS reconstructs real-world applications into lightweight web environments using multimodal code models and equips them with code-native rewards, executable assertions that provide deterministic reward signals and eliminate visual estimation noise. Extensive experiments show that GUI-GENESIS reduces environment latency by 10 times and costs by over $28,000 per epoch compared to training on real applications. Notably, agents trained with GUI-GENESIS outperform the base model by 14.54% and even real-world RL baselines by 3.27% on held-out real-world tasks. Finally, we observe that models can synthesize environments they cannot yet solve, highlighting a pathway for self-improving agents.

Problem

Research questions and friction points this paper is trying to address.

GUI agent post-training

verifiable rewards

environment synthesis

real-world applications

reward noise

Innovation

Methods, ideas, or system contributions that make the work stand out.

GUI-GENESIS

code-native rewards

executable assertions