CrafterDojo: A Suite of Foundation Models for Building Open-Ended Embodied Agents in Crafter

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Crafter—a Minecraft-like open-world environment—lacks a lightweight, general-purpose embodied agent research platform, as its utility remains constrained to narrow tasks due to the absence of foundational model support. Method: We introduce the first prototype development platform for embodied agents in Crafter, centered on the “Crafter Foundation Model Suite”: (i) CrafterVPT for behavioral pretraining, (ii) CrafterCLIP for vision-language alignment, and (iii) CrafterSteve-1 for instruction tuning. We further release open-source data generation tools, standardized benchmarks, and a fully documented codebase. Contribution/Results: Our suite unifies multimodal perception, behavioral priors, and instruction-following into a single modeling framework. This integration significantly improves end-to-end training efficiency and cross-task generalization, enabling scalable, reproducible research on general-purpose open-world embodied intelligence.

Technology Category

Application Category

📝 Abstract
Developing general-purpose embodied agents is a core challenge in AI. Minecraft provides rich complexity and internet-scale data, but its slow speed and engineering overhead make it unsuitable for rapid prototyping. Crafter offers a lightweight alternative that retains key challenges from Minecraft, yet its use has remained limited to narrow tasks due to the absence of foundation models that have driven progress in the Minecraft setting. In this paper, we present CrafterDojo, a suite of foundation models and tools that unlock the Crafter environment as a lightweight, prototyping-friendly, and Minecraft-like testbed for general-purpose embodied agent research. CrafterDojo addresses this by introducing CrafterVPT, CrafterCLIP, and CrafterSteve-1 for behavior priors, vision-language grounding, and instruction following, respectively. In addition, we provide toolkits for generating behavior and caption datasets (CrafterPlay and CrafterCaption), reference agent implementations, benchmark evaluations, and a complete open-source codebase.
Problem

Research questions and friction points this paper is trying to address.

Developing general-purpose embodied agents in AI
Providing lightweight alternative to Minecraft for prototyping
Creating foundation models for Crafter environment challenges
Innovation

Methods, ideas, or system contributions that make the work stand out.

CrafterVPT provides behavior priors for agents
CrafterCLIP enables vision-language grounding capabilities
CrafterSteve-1 supports instruction following functionality
🔎 Similar Papers
2024-07-09IEEE/ASME transactions on mechatronicsCitations: 94