🤖 AI Summary
Existing Minecraft agents suffer from limited action spaces and poor long-horizon planning capabilities when learning from scratch, hindering diverse exploration and task completion in open worlds. To address this, we propose a general agent framework for open-world embodied intelligence. Our method introduces: (1) the first open-world skill library—comprising 40 primitive actions and 183 composite skills; (2) a skill-driven architecture enabling long-horizon planning, dynamic decision-making, and autonomous exploration; and (3) a large-scale instruction-tuned LLaMA-3 model, distilled from over 390K instructions extracted from Minecraft Wiki, alongside the first benchmark evaluating three core embodied capabilities—navigation, tool use, and goal-directed reasoning. Experiments demonstrate substantial improvements in both task success rates and exploration diversity. All data, models, and code are publicly released to advance research in embodied AI.
📝 Abstract
Recent studies have delved into constructing generalist agents for open-world environments like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic programmatic tasks, e.g., material collection and tool-crafting following the Minecraft tech-tree, treating the ObtainDiamond task as the ultimate goal. This limitation stems from the narrowly defined set of actions available to agents, requiring them to learn effective long-horizon strategies from scratch. Consequently, discovering diverse gameplay opportunities in the open world becomes challenging. In this work, we introduce Odyssey, a new framework that empowers Large Language Model (LLM)-based agents with open-world skills to explore the vast Minecraft world. Odyssey comprises three key parts: (1) An interactive agent with an open-world skill library that consists of 40 primitive skills and 183 compositional skills. (2) A fine-tuned LLaMA-3 model trained on a large question-answering dataset with 390k+ instruction entries derived from the Minecraft Wiki. (3) A new agent capability benchmark includes the long-term planning task, the dynamic-immediate planning task, and the autonomous exploration task. Extensive experiments demonstrate that the proposed Odyssey framework can effectively evaluate different capabilities of LLM-based agents. All datasets, model weights, and code are publicly available to motivate future research on more advanced autonomous agent solutions.