🤖 AI Summary
Existing GUI agents suffer from overconfidence in GUI grounding—mapping language instructions to screen coordinates—where predicted confidence poorly aligns with actual accuracy, causing task failure from single-point errors. To address this, we propose HyperClick, the first framework integrating a dual-reward mechanism, truncated Gaussian spatial confidence modeling, and Brier score–driven uncertainty calibration to jointly optimize localization accuracy and confidence calibration. Our method unifies supervised fine-tuning and reinforcement fine-tuning, incorporates probabilistic spatial modeling, and introduces verbalized confidence assessment to enhance model introspection. Evaluated on seven mainstream benchmarks, HyperClick achieves state-of-the-art performance, reduces confidence calibration error by 32.7%, and substantially mitigates overfitting. This establishes a robust, trustworthy foundation for reliable GUI automation.
📝 Abstract
Autonomous Graphical User Interface (GUI) agents rely on accurate GUI grounding, which maps language instructions to on-screen coordinates, to execute user commands. However, current models, whether trained via supervised fine-tuning (SFT) or reinforcement fine-tuning (RFT), lack self-awareness of their capability boundaries, leading to overconfidence and unreliable predictions. We first systematically evaluate probabilistic and verbalized confidence in general and GUI-specific models, revealing a misalignment between confidence and actual accuracy, which is particularly critical in dynamic GUI automation tasks, where single errors can cause task failure. To address this, we propose HyperClick, a novel framework that enhances reliable GUI grounding through uncertainty calibration. HyperClick introduces a dual reward mechanism, combining a binary reward for correct actions with a truncated Gaussian-based spatial confidence modeling, calibrated using the Brier score. This approach jointly optimizes grounding accuracy and confidence reliability, fostering introspective self-criticism. Extensive experiments on seven challenge benchmarks show that HyperClick achieves state-of-the-art performance while providing well-calibrated confidence. By enabling explicit confidence calibration and introspective self-criticism, HyperClick reduces overconfidence and supports more reliable GUI automation.