HalluClear: Diagnosing, Evaluating and Mitigating Hallucinations in GUI Agents

📅 2026-04-19
📈 Citations: 0
Influential: 0
📄 PDF

career value

224K/year
🤖 AI Summary
This work addresses the critical challenge of hallucination-induced cascading failures in GUI agents during real-world deployment, where existing approaches lack fine-grained diagnosis, reliable evaluation, and efficient mitigation mechanisms. To this end, the paper proposes the first hallucination governance framework tailored for GUI environments. It introduces an empirically grounded hallucination taxonomy, a three-stage calibration and evaluation pipeline, and a lightweight closed-loop structured reasoning module augmented with a cold-start post-training strategy. Remarkably, the approach achieves a significant reduction in hallucination rates using only 9K training samples, substantially enhancing the agent’s environmental grounding and operational fidelity without requiring extensive computational resources.

Technology Category

Application Category

📝 Abstract
While progress in GUI agents has been largely driven by industrial-scale training, ungrounded hallucinations often trigger cascading failures in real-world deployments.Unlike general VLM domains, the GUI agent field lacks a hallucination-focused suite for fine-grained diagnosis, reliable evaluation, and targeted mitigation.To bridge this gap, we introduce HalluClear, a comprehensive suite for hallucination mitigation in GUI agents as a complement to computation-intensive scaling. HalluClear comprises: (1) a GUI-specific hallucination taxonomy derived from empirical failure analysis; (2) a calibrated three-stage evaluation workflow which enhances VLM-as-a-judge reliability via expert-annotated benchmarking and ensemble credibility estimation; and (3) a mitigation scheme based on closed-loop structured reasoning, enabling lightweight continual post-training with cold-start initialization for both generalist and GUI-specialist agents. Experiments across representative agents and public benchmarks demonstrate that post-training on only 9K samples within our suite can significantly reduce hallucinations, thereby improving grounding and action fidelity, offering a compute-efficient pathway to robust GUI automation.
Problem

Research questions and friction points this paper is trying to address.

hallucination
GUI agents
evaluation
diagnosis
mitigation
Innovation

Methods, ideas, or system contributions that make the work stand out.

hallucination mitigation
GUI agents
structured reasoning
VLM-as-a-judge
post-training
🔎 Similar Papers
No similar papers found.
C
Chao Jin
MAIS&NLPR, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, UCAS
W
Wenkui Yang
MAIS&NLPR, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, UCAS
H
Hao Sun
MAIS&NLPR, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, UCAS
Y
Yuqi Liao
Meituan
Q
Qianyi Jiang
Meituan
K
Kai Zhou
Meituan
Jie Cao
Jie Cao
Institute of Automation, Chinese Academy of Sciences
Computer Vision
R
Ran He
MAIS&NLPR, Institute of Automation, Chinese Academy of Sciences; School of Artificial Intelligence, UCAS
Huaibo Huang
Huaibo Huang
NLPR, MAIS, CASIA
Computer VisionGenerative ModelsLow-level VisionFace Recognition