HyperClick: Advancing Reliable GUI Grounding via Uncertainty Calibration

📅 2025-10-31

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing GUI agents suffer from overconfidence in GUI grounding—mapping language instructions to screen coordinates—where predicted confidence poorly aligns with actual accuracy, causing task failure from single-point errors. To address this, we propose HyperClick, the first framework integrating a dual-reward mechanism, truncated Gaussian spatial confidence modeling, and Brier score–driven uncertainty calibration to jointly optimize localization accuracy and confidence calibration. Our method unifies supervised fine-tuning and reinforcement fine-tuning, incorporates probabilistic spatial modeling, and introduces verbalized confidence assessment to enhance model introspection. Evaluated on seven mainstream benchmarks, HyperClick achieves state-of-the-art performance, reduces confidence calibration error by 32.7%, and substantially mitigates overfitting. This establishes a robust, trustworthy foundation for reliable GUI automation.

Technology Category

Application Category

📝 Abstract

Autonomous Graphical User Interface (GUI) agents rely on accurate GUI grounding, which maps language instructions to on-screen coordinates, to execute user commands. However, current models, whether trained via supervised fine-tuning (SFT) or reinforcement fine-tuning (RFT), lack self-awareness of their capability boundaries, leading to overconfidence and unreliable predictions. We first systematically evaluate probabilistic and verbalized confidence in general and GUI-specific models, revealing a misalignment between confidence and actual accuracy, which is particularly critical in dynamic GUI automation tasks, where single errors can cause task failure. To address this, we propose HyperClick, a novel framework that enhances reliable GUI grounding through uncertainty calibration. HyperClick introduces a dual reward mechanism, combining a binary reward for correct actions with a truncated Gaussian-based spatial confidence modeling, calibrated using the Brier score. This approach jointly optimizes grounding accuracy and confidence reliability, fostering introspective self-criticism. Extensive experiments on seven challenge benchmarks show that HyperClick achieves state-of-the-art performance while providing well-calibrated confidence. By enabling explicit confidence calibration and introspective self-criticism, HyperClick reduces overconfidence and supports more reliable GUI automation.

Problem

Research questions and friction points this paper is trying to address.

Addresses unreliable GUI grounding due to overconfidence in predictions

Improves uncertainty calibration for autonomous graphical user interface agents

Enhances confidence reliability and accuracy in dynamic GUI automation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uncertainty calibration framework for GUI grounding

Dual reward mechanism with binary and spatial confidence

Brier score calibrated confidence for reliable automation

🔎 Similar Papers

Visual grounding for desktop graphical user interfaces