KG-RAG: Enhancing GUI Agent Decision-Making via Knowledge Graph-Driven Retrieval-Augmented Generation

📅 2025-08-30

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Large language models (LLMs) suffer from insufficient decision-making capability in complex mobile GUI tasks due to lack of domain-specific application knowledge. To address this, we propose a knowledge graph–driven retrieval-augmented generation (RAG) framework that transforms sparse, low-quality UI transition graphs (UTGs) into structured vector knowledge graphs, incorporates intent-guided multi-hop retrieval, and enables real-time navigation path planning. Our contributions are fourfold: (1) the first UTG-enhanced RAG architecture tailored for the Chinese mobile ecosystem; (2) two cross-application benchmark suites; (3) state-of-the-art performance on mainstream mobile apps—75.8% task success rate, 84.6% decision accuracy, and an average of 4.1 steps per task; and (4) empirical validation of strong generalizability to web and desktop GUI environments.

Technology Category

Application Category

📝 Abstract

Despite recent progress, Graphic User Interface (GUI) agents powered by Large Language Models (LLMs) struggle with complex mobile tasks due to limited app-specific knowledge. While UI Transition Graphs (UTGs) offer structured navigation representations, they are underutilized due to poor extraction and inefficient integration. We introduce KG-RAG, a Knowledge Graph-driven Retrieval-Augmented Generation framework that transforms fragmented UTGs into structured vector databases for efficient real-time retrieval. By leveraging an intent-guided LLM search method, KG-RAG generates actionable navigation paths, enhancing agent decision-making. Experiments across diverse mobile apps show that KG-RAG outperforms existing methods, achieving a 75.8% success rate (8.9% improvement over AutoDroid), 84.6% decision accuracy (8.1% improvement), and reducing average task steps from 4.5 to 4.1. Additionally, we present KG-Android-Bench and KG-Harmony-Bench, two benchmarks tailored to the Chinese mobile ecosystem for future research. Finally, KG-RAG transfers to web/desktop (+40% SR on Weibo-web; +20% on QQ Music-desktop), and a UTG cost ablation shows accuracy saturates at ~4h per complex app, enabling practical deployment trade-offs.

Problem

Research questions and friction points this paper is trying to address.

GUI agents lack app-specific knowledge for complex mobile tasks

UI Transition Graphs are underutilized due to poor extraction methods

Existing frameworks inefficiently integrate structured navigation representations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transforms UI graphs into structured vector databases

Uses intent-guided LLM search for navigation paths

Achieves efficient real-time retrieval for GUI agents

🔎 Similar Papers

No similar papers found.