Experience-Driven Exploration for Efficient API-Free AI Agents

📅 2025-10-16

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Existing LLM-based agents exhibit efficiency bottlenecks in API-free, pixel-level GUI environments due to reliance on local visual observations, myopic decision-making, and inefficient trial-and-error—hindering long-horizon planning and skill transfer. To address this, we propose the State-Action Knowledge Graph (SA-KG): a persistent, topology-aware graph-structured memory enabling cross-visual-difference state alignment and experience reuse. We further design a hybrid intrinsic reward mechanism that decouples strategic policy planning from stochastic exploration, facilitating principled action evaluation under delayed rewards. Our method integrates LLM-based reasoning, graph neural networks, and a hierarchical reward scheme grounded in novelty and state-value estimation. Evaluated on *Civilization V* and *Slay the Spire*, our approach significantly improves exploration efficiency, zero-shot generalization, and deep strategic reasoning—outperforming state-of-the-art pixel-level agent methods.

Technology Category

Application Category

📝 Abstract

Most existing software lacks accessible Application Programming Interfaces (APIs), requiring agents to operate solely through pixel-based Graphical User Interfaces (GUIs). In this API-free setting, large language model (LLM)-based agents face severe efficiency bottlenecks: limited to local visual experiences, they make myopic decisions and rely on inefficient trial-and-error, hindering both skill acquisition and long-term planning. To address these challenges, we propose KG-Agent, an experience-driven learning framework that structures an agent's raw pixel-level interactions into a persistent State-Action Knowledge Graph (SA-KG). KG-Agent overcomes inefficient exploration by linking functionally similar but visually distinct GUI states, forming a rich neighborhood of experience that enables the agent to generalize from a diverse set of historical strategies. To support long-horizon reasoning, we design a hybrid intrinsic reward mechanism based on the graph topology, combining a state value reward for exploiting known high-value pathways with a novelty reward that encourages targeted exploration. This approach decouples strategic planning from pure discovery, allowing the agent to effectively value setup actions with delayed gratification. We evaluate KG-Agent in two complex, open-ended GUI-based decision-making environments (Civilization V and Slay the Spire), demonstrating significant improvements in exploration efficiency and strategic depth over the state-of-the-art methods.

Problem

Research questions and friction points this paper is trying to address.

Addresses inefficient exploration in API-free GUI environments

Overcomes myopic decision-making in pixel-based AI agents

Enables long-term planning for visual interaction-based agents

Innovation

Methods, ideas, or system contributions that make the work stand out.

Structures pixel interactions into State-Action Knowledge Graph

Links similar GUI states to generalize historical strategies

Uses hybrid graph-based rewards for strategic planning

🔎 Similar Papers

A Survey on Large Language Model based Autonomous Agents