🤖 AI Summary
To address the challenge of inefficient knowledge acquisition by LLM-based agents in hard-exploration tasks, this paper proposes GLoW, a dual-scale world model framework. Methodologically, GLoW decouples global frontier discovery from local trial-and-error learning to establish a synergistic exploration mechanism; introduces a multi-path advantage reflection module that dynamically modulates exploration policies using advantage signals; and integrates LLM-based reasoning, incremental learning, and trajectory frontier maintenance. Evaluated on the Jericho text-game benchmark, GLoW achieves state-of-the-art performance—comparable to advanced reinforcement learning methods—while reducing environment interactions by 100–800×. This demonstrates substantial improvements in both exploration efficiency and sample efficiency.
📝 Abstract
LLM-based agents have seen promising advances, yet they are still limited in "hard-exploration" tasks requiring learning new knowledge through exploration. We present GLoW, a novel approach leveraging dual-scale world models, maintaining a trajectory frontier of high-value discoveries at the global scale, while learning from local trial-and-error in exploration through a Multi-path Advantage Reflection mechanism which infers advantage-based progress signals to guide exploration. To evaluate our framework for hard-exploration, we tackle the Jericho benchmark suite of text-based games, where GLoW achieves a new state-of-theart performance for LLM-based approaches. Compared to state-of-the-art RLbased methods, our approach achieves comparable performance while requiring 100-800x fewer environment interactions.