MolmoSpaces: A Large-Scale Open Ecosystem for Robot Navigation and Manipulation

📅 2026-02-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing robotic benchmarks struggle to capture the vast diversity of scene layouts, object geometries, and task specifications encountered in real-world environments, thereby limiting the evaluation of policy generalization in long-tail everyday scenarios. To address this, we propose MolmoSpaces-Bench—a large-scale, open, and simulator-agnostic evaluation ecosystem that integrates over 230,000 diverse indoor scenes and 130,000 meticulously annotated objects, including 48,000 manipulable items with 42 million stable grasp poses. The benchmark supports navigation, static and dynamic manipulation, and cross-room long-horizon tasks. Built on procedural generation, a multi-simulator compatible architecture (MuJoCo, Isaac, ManiSkill), and standardized interfaces, it ensures high reproducibility and demonstrates strong sim-to-real correlation (R=0.96, ρ=0.98), effectively validating zero-shot policy performance and revealing the critical impact of prompt phrasing, initial poses, and occlusion on task success.

Technology Category

Application Category

📝 Abstract
Deploying robots at scale demands robustness to the long tail of everyday situations. The countless variations in scene layout, object geometry, and task specifications that characterize real environments are vast and underrepresented in existing robot benchmarks. Measuring this level of generalization requires infrastructure at a scale and diversity that physical evaluation alone cannot provide. We introduce MolmoSpaces, a fully open ecosystem to support large-scale benchmarking of robot policies. MolmoSpaces consists of over 230k diverse indoor environments, ranging from handcrafted household scenes to procedurally generated multiroom houses, populated with 130k richly annotated object assets, including 48k manipulable objects with 42M stable grasps. Crucially, these environments are simulator-agnostic, supporting popular options such as MuJoCo, Isaac, and ManiSkill. The ecosystem supports the full spectrum of embodied tasks: static and mobile manipulation, navigation, and multiroom long-horizon tasks requiring coordinated perception, planning, and interaction across entire indoor environments. We also design MolmoSpaces-Bench, a benchmark suite of 8 tasks in which robots interact with our diverse scenes and richly annotated objects. Our experiments show MolmoSpaces-Bench exhibits strong sim-to-real correlation (R = 0.96, \r{ho} = 0.98), confirm newer and stronger zero-shot policies outperform earlier versions in our benchmarks, and identify key sensitivities to prompt phrasing, initial joint positions, and camera occlusion. Through MolmoSpaces and its open-source assets and tooling, we provide a foundation for scalable data generation, policy training, and benchmark creation for robot learning research.
Problem

Research questions and friction points this paper is trying to address.

robot generalization
large-scale benchmarking
embodied tasks
sim-to-real transfer
diverse indoor environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

large-scale simulation
simulator-agnostic robotics
grasp-rich object assets
embodied AI benchmarking
sim-to-real correlation
Yejin Kim
Yejin Kim
Ai2
W
Wilbert Pumacay
Allen Institute for AI
Omar Rayyan
Omar Rayyan
UCLA
RoboticsMachine Learning
Max Argus
Max Argus
University of Freiburg
CV | ML | Robotics
W
Winson Han
Allen Institute for AI
Eli VanderBilt
Eli VanderBilt
Technical Artist
Jordi Salvador
Jordi Salvador
Allen Institute for AI
Computer VisionMachine LearningEmbodied AI
Abhay Deshpande
Abhay Deshpande
Allen Institute for Artificial Intelligence
RoboticsMachine Learning
Rose Hendrix
Rose Hendrix
Research Engineer @ PRIOR, AI2
roboticsmachine learning
Snehal Jauhri
Snehal Jauhri
Technische Universität Darmstadt
RoboticsMachine LearningComputer Vision
Shuo Liu
Shuo Liu
University of Washington, Allen Institute of AI
RoboticsArtificial intelligence
Nur Muhammad Mahi Shafiullah
Nur Muhammad Mahi Shafiullah
Postdoctoral Researcher, Meta AI & Berkeley AI Research (BAIR)
RoboticsMachine learning
M
Maya Guru
Allen Institute for AI
Ainaz Eftekhar
Ainaz Eftekhar
PhD Student, University of Washington
Computer visionReinforcement LearningEmbodied AIRoboticsMachine learning
K
Karen Farley
Allen Institute for AI
D
Donovan Clay
University of Washington
Jiafei Duan
Jiafei Duan
Computer Science PhD Student, University of Washington
RoboticsRobot LearningEmbodied AIRobotic Manipulation
A
Arjun Guru
University of Washington
Piper Wolters
Piper Wolters
Research Engineer, Allen Institute for AI
Computer VisionDeep Learning
Alvaro Herrasti
Alvaro Herrasti
Research Engineer
deep learningreinforcement learningcomputer visionnatural language processing
Y
Ying-Chun Lee
University of Washington
Georgia Chalvatzaki
Georgia Chalvatzaki
Professor for Interactive Robot Perception and Learning, Technische Universität Darmstadt
RoboticsMachine LearningReinforcement LearningRobot PerceptionHRI
Y
Yuchen Cui
University of California, Los Angeles
Ali Farhadi
Ali Farhadi
Professor, Computer Science and Engineering, University of Washington
Computer VisionMachine learningArtificial Intelligence
Dieter Fox
Dieter Fox
University of Washington and AI2
RoboticsArtificial IntelligenceComputer Vision