UI-Venus-1.5 Technical Report

📅 2026-02-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes UI-Venus-1.5, an end-to-end unified model designed to address the challenge of simultaneously achieving generalization and high task performance for GUI agents in real-world scenarios. The approach integrates dense-parameter models (2B/8B) with a mixture-of-experts architecture (30B-A3B) and innovatively combines billion-token intermediate training, full-trajectory online reinforcement learning, and multi-domain model merging. This enables a single model to execute tasks efficiently across diverse platforms, including web and mobile interfaces. UI-Venus-1.5 achieves state-of-the-art results on multiple benchmarks—ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%)—and demonstrates exceptional real-world task completion capabilities on Chinese mobile applications.

Technology Category

Application Category

📝 Abstract
GUI agents have emerged as a powerful paradigm for automating interactions in digital environments, yet achieving both broad generality and consistently strong task performance remains challenging.In this report, we present UI-Venus-1.5, a unified, end-to-end GUI Agent designed for robust real-world applications.The proposed model family comprises two dense variants (2B and 8B) and one mixture-of-experts variant (30B-A3B) to meet various downstream application scenarios.Compared to our previous version, UI-Venus-1.5 introduces three key technical advances: (1) a comprehensive Mid-Training stage leveraging 10 billion tokens across 30+ datasets to establish foundational GUI semantics; (2) Online Reinforcement Learning with full-trajectory rollouts, aligning training objectives with long-horizon, dynamic navigation in large-scale environments; and (3) a single unified GUI Agent constructed via Model Merging, which synthesizes domain-specific models (grounding, web, and mobile) into one cohesive checkpoint. Extensive evaluations demonstrate that UI-Venus-1.5 establishes new state-of-the-art performance on benchmarks such as ScreenSpot-Pro (69.6%), VenusBench-GD (75.0%), and AndroidWorld (77.6%), significantly outperforming previous strong baselines. In addition, UI-Venus-1.5 demonstrates robust navigation capabilities across a variety of Chinese mobile apps, effectively executing user instructions in real-world scenarios. Code: https://github.com/inclusionAI/UI-Venus; Model: https://huggingface.co/collections/inclusionAI/ui-venus
Problem

Research questions and friction points this paper is trying to address.

GUI agent
task performance
generality
real-world automation
digital environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mid-Training
Online Reinforcement Learning
Model Merging
GUI Agent
End-to-End
🔎 Similar Papers
No similar papers found.