GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the poor performance of open-source, native GUI agents in long-horizon navigation tasks, which stems primarily from the scarcity of high-quality, action-aligned reasoning data and the inadequacy of generic post-training pipelines for the unique challenges of GUI environments. To overcome these limitations, the authors propose a tailored training framework for GUI agents, comprising an 81K-scale, high-quality action-aware reasoning dataset, an action-aware supervised fine-tuning (SFT) strategy to align reasoning with execution, and a reinforcement learning approach with KL-divergence-based trust-region constraints (RLVR) coupled with a success-adaptive gradient scaling mechanism to enhance training stability and on-policy/off-policy consistency. Experiments demonstrate substantial improvements in step-wise accuracy and end-to-end task completion across multiple web and mobile benchmarks, validating the efficacy of data-efficient post-training for advancing GUI agent capabilities.

Technology Category

Application Category

📝 Abstract
Open-source native GUI agents still lag behind closed-source systems on long-horizon navigation tasks. This gap stems from two limitations: a shortage of high-quality, action-aligned reasoning data, and the direct adoption of generic post-training pipelines that overlook the unique challenges of GUI agents. We identify two fundamental issues in these pipelines: (i) standard SFT with CoT reasoning often hurts grounding, and (ii) step-wise RLVR-tyle training faces partial verifiability, where multiple actions can be correct but only a single demonstrated action is used for verification. This makes offline step-wise metrics weak predictors of online task success. In this work, we present GUI-Libra, a tailored training recipe that addresses these challenges. First, to mitigate the scarcity of action-aligned reasoning data, we introduce a data construction and filtering pipeline and release a curated 81K GUI reasoning dataset. Second, to reconcile reasoning with grounding, we propose action-aware SFT that mixes reasoning-then-action and direct-action data and reweights tokens to emphasize action and grounding. Third, to stabilize RL under partial verifiability, we identify the overlooked importance of KL regularization in RLVR and show that a KL trust region is critical for improving offline-to-online predictability; we further introduce success-adaptive scaling to downweight unreliable negative gradients. Across diverse web and mobile benchmarks, GUI-Libra consistently improves both step-wise accuracy and end-to-end task completion. Our results suggest that carefully designed post-training and data curation can unlock significantly stronger task-solving capabilities without costly online data collection. We release our dataset, code, and models to facilitate further research on data-efficient post-training for reasoning-capable GUI agents.
Problem

Research questions and friction points this paper is trying to address.

GUI agents
action-aligned reasoning
partial verifiability
grounding
long-horizon navigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

action-aware SFT
partially verifiable RL
KL regularization
GUI reasoning dataset
offline-to-online predictability
🔎 Similar Papers
No similar papers found.