Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

164K/year

🤖 AI Summary

This work addresses the limitations of existing GUI grounding methods, which rely on multiple rollouts and suffer from sparse rewards on challenging instances. The authors propose GUI-SD, the first online policy self-distillation framework for GUI grounding that generates dense token-level supervision signals from a single rollout. Its key innovations include a vision-enhanced privileged context—leveraging target bounding boxes and Gaussian soft masks to provide coordinate-free guidance—and an entropy-guided self-distillation mechanism that adaptively weights tokens based on numerical saliency and teacher confidence. Experimental results demonstrate that GUI-SD significantly outperforms strong baselines such as GRPO across six mainstream benchmarks, achieving state-of-the-art performance in both accuracy and training efficiency.

📝 Abstract

Graphical User Interface (GUI) grounding maps natural language instructions to the visual coordinates of target elements and serves as a core capability for autonomous GUI agents. Recent reinforcement learning methods (e.g., GRPO) have achieved strong performance, but they rely on expensive multiple rollouts and suffer from sparse signals on hard samples. These limitations make on-policy self-distillation (OPSD), which provides dense token-level supervision from a single rollout, a promising alternative. However, its applicability to GUI grounding remains unexplored. In this paper, we present GUI-SD, the first OPSD framework tailored for GUI grounding. First, it constructs a visually enriched privileged context for the teacher using a target bounding box and a Gaussian soft mask, providing informative guidance without leaking exact coordinates. Second, it employs entropy-guided distillation, which adaptively weights tokens based on digit significance and teacher confidence, concentrating optimization on the most impactful and reliable positions. Extensive experiments on six representative GUI grounding benchmarks show that GUI-SD consistently outperforms GRPO-based methods and naive OPSD in both accuracy and training efficiency. Code and training data are available at https://zhangyan-ucas.github.io/GUI-SD/.

Problem

Research questions and friction points this paper is trying to address.

GUI grounding

on-policy self-distillation

reinforcement learning

sparse reward

autonomous GUI agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

on-policy self-distillation

GUI grounding

privileged context