🤖 AI Summary
Addressing the challenges of sim-to-real transfer and poor skill generalization for mobile robots in home environments, this paper introduces AgentWorld: an interactive simulation platform tailored for mobile manipulation. AgentWorld enables automated generation of domestic scenes—including layouts, semantically annotated assets, material properties, and physics-aware modeling—and supports dual-mode teleoperation for both wheeled and humanoid robots. It further provides an end-to-end data collection framework spanning atomic actions to multi-stage tasks. Methodologically, it innovatively integrates behavior cloning, action chunking with Transformers, diffusion-based policies, and vision-language-action models to produce a high-quality, diverse dataset. Experiments demonstrate that our approach significantly improves cross-domain generalization across multiple imitation learning paradigms—enhancing performance both within simulation and on real-world hardware—thereby effectively narrowing the sim-to-real gap for mobile manipulation in household settings.
📝 Abstract
We introduce AgentWorld, an interactive simulation platform for developing household mobile manipulation capabilities. Our platform combines automated scene construction that encompasses layout generation, semantic asset placement, visual material configuration, and physics simulation, with a dual-mode teleoperation system supporting both wheeled bases and humanoid locomotion policies for data collection. The resulting AgentWorld Dataset captures diverse tasks ranging from primitive actions (pick-and-place, push-pull, etc.) to multistage activities (serve drinks, heat up food, etc.) across living rooms, bedrooms, and kitchens. Through extensive benchmarking of imitation learning methods including behavior cloning, action chunking transformers, diffusion policies, and vision-language-action models, we demonstrate the dataset's effectiveness for sim-to-real transfer. The integrated system provides a comprehensive solution for scalable robotic skill acquisition in complex home environments, bridging the gap between simulation-based training and real-world deployment. The code, datasets will be available at https://yizhengzhang1.github.io/agent_world/