GEM: Gaussian Embedding Modeling for Out-of-Distribution Detection in GUI Agents

📅 2025-05-19

📈 Citations: 0

✨ Influential: 0

career value

277K/year

🤖 AI Summary

GUI agents deployed in real-world terminals often fail or pose security risks when encountering out-of-distribution (OOD) instructions. To address this, we propose a novel OOD detection method that models the distribution of distances from input embeddings to semantic centers in the agent’s internal representation space. We first identify a pronounced distance-based clustering property within the embedded semantic space of GUI agents and, for the first time, employ Gaussian Mixture Models (GMMs) to explicitly characterize this distance distribution—thereby formalizing the agent’s operational capability boundary. Our approach integrates GUI action embedding extraction, statistical distance modeling, and a unified cross-platform, cross-device evaluation framework. Evaluated across eight diverse platform-agnostic datasets, it achieves an average OOD detection accuracy improvement of 23.70% over baselines. The method demonstrates strong generalization and is compatible with nine mainstream backbone architectures.

Technology Category

Application Category

📝 Abstract

Graphical user interface (GUI) agents have recently emerged as an intriguing paradigm for human-computer interaction, capable of automatically executing user instructions to operate intelligent terminal devices. However, when encountering out-of-distribution (OOD) instructions that violate environmental constraints or exceed the current capabilities of agents, GUI agents may suffer task breakdowns or even pose security threats. Therefore, effective OOD detection for GUI agents is essential. Traditional OOD detection methods perform suboptimally in this domain due to the complex embedding space and evolving GUI environments. In this work, we observe that the in-distribution input semantic space of GUI agents exhibits a clustering pattern with respect to the distance from the centroid. Based on the finding, we propose GEM, a novel method based on fitting a Gaussian mixture model over input embedding distances extracted from the GUI Agent that reflect its capability boundary. Evaluated on eight datasets spanning smartphones, computers, and web browsers, our method achieves an average accuracy improvement of 23.70% over the best-performing baseline. Analysis verifies the generalization ability of our method through experiments on nine different backbones. The codes are available at https://github.com/Wuzheng02/GEM-OODforGUIagents.

Problem

Research questions and friction points this paper is trying to address.

Detecting out-of-distribution instructions in GUI agents

Improving OOD detection in complex GUI environments

Enhancing accuracy of capability boundary identification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gaussian mixture model for OOD detection

Clustering pattern in semantic space

Improved accuracy by 23.70%

🔎 Similar Papers

Concept Matching with Agent for Out-of-Distribution Detection