Dual-Scale World Models for LLM Agents Towards Hard-Exploration Problems

📅 2025-09-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the challenge of inefficient knowledge acquisition by LLM-based agents in hard-exploration tasks, this paper proposes GLoW, a dual-scale world model framework. Methodologically, GLoW decouples global frontier discovery from local trial-and-error learning to establish a synergistic exploration mechanism; introduces a multi-path advantage reflection module that dynamically modulates exploration policies using advantage signals; and integrates LLM-based reasoning, incremental learning, and trajectory frontier maintenance. Evaluated on the Jericho text-game benchmark, GLoW achieves state-of-the-art performance—comparable to advanced reinforcement learning methods—while reducing environment interactions by 100–800×. This demonstrates substantial improvements in both exploration efficiency and sample efficiency.

Technology Category

Application Category

📝 Abstract

LLM-based agents have seen promising advances, yet they are still limited in "hard-exploration" tasks requiring learning new knowledge through exploration. We present GLoW, a novel approach leveraging dual-scale world models, maintaining a trajectory frontier of high-value discoveries at the global scale, while learning from local trial-and-error in exploration through a Multi-path Advantage Reflection mechanism which infers advantage-based progress signals to guide exploration. To evaluate our framework for hard-exploration, we tackle the Jericho benchmark suite of text-based games, where GLoW achieves a new state-of-theart performance for LLM-based approaches. Compared to state-of-the-art RLbased methods, our approach achieves comparable performance while requiring 100-800x fewer environment interactions.

Problem

Research questions and friction points this paper is trying to address.

LLM agents struggle with hard-exploration tasks requiring new knowledge

Proposes dual-scale world models to guide exploration via progress signals

Achieves state-of-the-art performance in text games with fewer interactions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses dual-scale world models for exploration

Implements Multi-path Advantage Reflection mechanism

Achieves high performance with fewer environment interactions

🔎 Similar Papers

Odyssey: Empowering Minecraft Agents with Open-World Skills

2024-07-22Citations: 3

Navigation with VLM framework: Go to Any Language

2024-09-18arXiv.orgCitations: 3

Authors to Follow