UI-KOBE: Knowledge-Oriented Behavior Exploration for Lightweight Graph-Guided GUI Agents

📅 2026-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited reliability of lightweight mobile GUI agents in end-to-end task execution using only screenshots, a challenge stemming from constrained model capacity. To overcome this, the authors propose UI-KOBE, a novel framework that integrates application-specific UI state knowledge graphs into lightweight agents for the first time. The framework autonomously constructs such knowledge graphs through exploration and leverages their graph structure to guide action selection during inference. By combining lightweight vision-language reasoning with graph-guided behavioral exploration, UI-KOBE significantly improves success rates on complex tasks while reducing reliance on large models. The approach maintains user privacy, enables efficient on-device deployment, and offers both interpretability and practical utility.
📝 Abstract
Recent advances in mobile GUI agents have shown strong potential for automating mobile tasks, but most effective systems still depend on large vision-language models for screenshot understanding and long-horizon planning. Small GUI agents that can be deployed directly on mobile devices are more attractive for practical use, offering lower inference cost and better protection of sensitive on-device information. However, due to limited model capacity, such lightweight agents remain unreliable when planning and executing GUI tasks end-to-end from screenshots alone. We propose Knowledge-Oriented Behavior Exploration (\textbf{UI-KOBE}), a framework that improves lightweight mobile GUI agents with reusable app-specific graph knowledge. UI-KOBE first autonomously explores a mobile application and constructs an app knowledge graph, where nodes represent distinct UI states and edges represent executable transitions. At runtime, a lightweight GUI agent uses the graph as external guidance: given a user task and the current screenshot, it identifies the current graph node and selects among self-loop actions, neighboring transitions, task completion, or fallback free actions associated with that node. By supporting runtime decisions with app-specific graph guidance, UI-KOBE reduces the burden of end-to-end GUI planning and helps lightweight models perform mobile GUI tasks more effectively, offering a practical step toward efficient, interpretable, and privacy-conscious on-device GUI agents.
Problem

Research questions and friction points this paper is trying to address.

lightweight GUI agents
mobile GUI automation
end-to-end planning
on-device deployment
screenshot understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

lightweight GUI agents
knowledge graph
behavior exploration
on-device AI
mobile automation
🔎 Similar Papers
No similar papers found.