HalluClear: Diagnosing, Evaluating and Mitigating Hallucinations in GUI Agents

📅 2026-04-19

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This work addresses the critical challenge of hallucination-induced cascading failures in GUI agents during real-world deployment, where existing approaches lack fine-grained diagnosis, reliable evaluation, and efficient mitigation mechanisms. To this end, the paper proposes the first hallucination governance framework tailored for GUI environments. It introduces an empirically grounded hallucination taxonomy, a three-stage calibration and evaluation pipeline, and a lightweight closed-loop structured reasoning module augmented with a cold-start post-training strategy. Remarkably, the approach achieves a significant reduction in hallucination rates using only 9K training samples, substantially enhancing the agent’s environmental grounding and operational fidelity without requiring extensive computational resources.

Technology Category

Application Category

📝 Abstract

While progress in GUI agents has been largely driven by industrial-scale training, ungrounded hallucinations often trigger cascading failures in real-world deployments.Unlike general VLM domains, the GUI agent field lacks a hallucination-focused suite for fine-grained diagnosis, reliable evaluation, and targeted mitigation.To bridge this gap, we introduce HalluClear, a comprehensive suite for hallucination mitigation in GUI agents as a complement to computation-intensive scaling. HalluClear comprises: (1) a GUI-specific hallucination taxonomy derived from empirical failure analysis; (2) a calibrated three-stage evaluation workflow which enhances VLM-as-a-judge reliability via expert-annotated benchmarking and ensemble credibility estimation; and (3) a mitigation scheme based on closed-loop structured reasoning, enabling lightweight continual post-training with cold-start initialization for both generalist and GUI-specialist agents. Experiments across representative agents and public benchmarks demonstrate that post-training on only 9K samples within our suite can significantly reduce hallucinations, thereby improving grounding and action fidelity, offering a compute-efficient pathway to robust GUI automation.

Problem

Research questions and friction points this paper is trying to address.

hallucination

GUI agents

evaluation

diagnosis

mitigation

Innovation

Methods, ideas, or system contributions that make the work stand out.

hallucination mitigation

GUI agents

structured reasoning