π€ AI Summary
Large language models (LLMs) exhibit limited continual learning capabilities and poor reusability of historical experience in open-world environments such as Minecraft.
Method: This paper proposes a memory-augmented embodied agent framework that formalizes experiences as retrievable and evolvable natural-language memory tuplesβ(state, task, plan, outcome)βand integrates contrastive learning for embedding, approximate nearest-neighbor (ANN) retrieval, and task-driven online memory updates to enable zero-shot planning and closed-loop dynamic plan refinement.
Contribution/Results: It is the first work to formalize the human mental model as a structured, evolvable linguistic memory system. Evaluated on the MineDojo benchmark, the framework achieves a 37% improvement in task completion rate and a 2.1Γ increase in cross-task generalization success, while preserving zero-shot adaptability.
π Abstract
While large language models (LLMs) have shown promising capabilities as zero-shot planners for embodied agents, their inability to learn from experience and build persistent mental models limits their robustness in complex open-world environments like Minecraft. We introduce MINDSTORES, an experience-augmented planning framework that enables embodied agents to build and leverage mental models through natural interaction with their environment. Drawing inspiration from how humans construct and refine cognitive mental models, our approach extends existing zero-shot LLM planning by maintaining a database of past experiences that informs future planning iterations. The key innovation is representing accumulated experiences as natural language embeddings of (state, task, plan, outcome) tuples, which can then be efficiently retrieved and reasoned over by an LLM planner to generate insights and guide plan refinement for novel states and tasks. Through extensive experiments in the MineDojo environment, a simulation environment for agents in Minecraft that provides low-level controls for Minecraft, we find that MINDSTORES learns and applies its knowledge significantly better than existing memory-based LLM planners while maintaining the flexibility and generalization benefits of zero-shot approaches, representing an important step toward more capable embodied AI systems that can learn continuously through natural experience.