Odyssey: Empowering Minecraft Agents with Open-World Skills

📅 2024-07-22
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Existing Minecraft agents suffer from limited action spaces and poor long-horizon planning capabilities when learning from scratch, hindering diverse exploration and task completion in open worlds. To address this, we propose a general agent framework for open-world embodied intelligence. Our method introduces: (1) the first open-world skill library—comprising 40 primitive actions and 183 composite skills; (2) a skill-driven architecture enabling long-horizon planning, dynamic decision-making, and autonomous exploration; and (3) a large-scale instruction-tuned LLaMA-3 model, distilled from over 390K instructions extracted from Minecraft Wiki, alongside the first benchmark evaluating three core embodied capabilities—navigation, tool use, and goal-directed reasoning. Experiments demonstrate substantial improvements in both task success rates and exploration diversity. All data, models, and code are publicly released to advance research in embodied AI.

Technology Category

Application Category

📝 Abstract
Recent studies have delved into constructing generalist agents for open-world environments like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic programmatic tasks, e.g., material collection and tool-crafting following the Minecraft tech-tree, treating the ObtainDiamond task as the ultimate goal. This limitation stems from the narrowly defined set of actions available to agents, requiring them to learn effective long-horizon strategies from scratch. Consequently, discovering diverse gameplay opportunities in the open world becomes challenging. In this work, we introduce Odyssey, a new framework that empowers Large Language Model (LLM)-based agents with open-world skills to explore the vast Minecraft world. Odyssey comprises three key parts: (1) An interactive agent with an open-world skill library that consists of 40 primitive skills and 183 compositional skills. (2) A fine-tuned LLaMA-3 model trained on a large question-answering dataset with 390k+ instruction entries derived from the Minecraft Wiki. (3) A new agent capability benchmark includes the long-term planning task, the dynamic-immediate planning task, and the autonomous exploration task. Extensive experiments demonstrate that the proposed Odyssey framework can effectively evaluate different capabilities of LLM-based agents. All datasets, model weights, and code are publicly available to motivate future research on more advanced autonomous agent solutions.
Problem

Research questions and friction points this paper is trying to address.

Develop generalist agents for open-world Minecraft environments
Overcome limited action sets for long-horizon strategy learning
Enable diverse gameplay exploration with advanced LLM-based skills
Innovation

Methods, ideas, or system contributions that make the work stand out.

Open-world skill library with 223 skills
Fine-tuned LLaMA-3 on 390k+ Minecraft instructions
New benchmark for agent capability evaluation
🔎 Similar Papers
No similar papers found.