KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation

๐Ÿ“… 2025-08-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) suffer from insufficient decision-making capability in complex mobile GUI tasks due to lack of domain-specific application knowledge. To address this, we propose a knowledge graphโ€“driven retrieval-augmented generation (RAG) framework that transforms sparse, low-quality UI transition graphs (UTGs) into structured vector knowledge graphs, incorporates intent-guided multi-hop retrieval, and enables real-time navigation path planning. Our contributions are fourfold: (1) the first UTG-enhanced RAG architecture tailored for the Chinese mobile ecosystem; (2) two cross-application benchmark suites; (3) state-of-the-art performance on mainstream mobile appsโ€”75.8% task success rate, 84.6% decision accuracy, and an average of 4.1 steps per task; and (4) empirical validation of strong generalizability to web and desktop GUI environments.

Technology Category

Application Category

๐Ÿ“ Abstract
Despite recent progress, Graphic User Interface (GUI) agents powered by Large Language Models (LLMs) struggle with complex mobile tasks due to limited app-specific knowledge. While UI Transition Graphs (UTGs) offer structured navigation representations, they are underutilized due to poor extraction and inefficient integration. We introduce KG-RAG, a Knowledge Graph-driven Retrieval-Augmented Generation framework that transforms fragmented UTGs into structured vector databases for efficient real-time retrieval. By leveraging an intent-guided LLM search method, KG-RAG generates actionable navigation paths, enhancing agent decision-making. Experiments across diverse mobile apps show that KG-RAG outperforms existing methods, achieving a 75.8% success rate (8.9% improvement over AutoDroid), 84.6% decision accuracy (8.1% improvement), and reducing average task steps from 4.5 to 4.1. Additionally, we present KG-Android-Bench and KG-Harmony-Bench, two benchmarks tailored to the Chinese mobile ecosystem for future research. Finally, KG-RAG transfers to web/desktop (+40% SR on Weibo-web; +20% on QQ Music-desktop), and a UTG cost ablation shows accuracy saturates at ~4h per complex app, enabling practical deployment trade-offs.
Problem

Research questions and friction points this paper is trying to address.

GUI agents lack app-specific knowledge for complex mobile tasks
UI Transition Graphs are underutilized due to poor extraction methods
Existing frameworks inefficiently integrate structured navigation representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms UI graphs into structured vector databases
Uses intent-guided LLM search for navigation paths
Achieves efficient real-time retrieval for GUI agents
๐Ÿ”Ž Similar Papers
No similar papers found.