ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI

📅 2024-10-01

🏛️ arXiv.org

📈 Citations: 33

✨ Influential: 3

career value

234K/year

🤖 AI Summary

Existing robot simulation frameworks suffer from narrow task coverage and weak physical/visual modeling capabilities, hindering general embodied intelligence development and efficient sim-to-real transfer. To address this, we propose the first full-stack, GPU-accelerated simulation and rendering platform tailored for general embodied intelligence. Our framework introduces a novel tightly integrated architecture unifying a parallel CUDA-based physics engine, high-fidelity rendering (via SAPIEN extension), and multimodal perception (point cloud/voxel). It enables large-scale heterogeneous-scene parallel simulation, artist-grade digital twin environment construction, and integration of million-scale, multi-source demonstration datasets across 12 contact-rich manipulation tasks. Empirical evaluation achieves >30,000 FPS—10–1000× faster than mainstream frameworks—with 2–3× reduced GPU memory consumption and training time compressed from hours to minutes. The platform fully supports both reinforcement learning and imitation learning baselines.

Technology Category

Application Category

📝 Abstract

Simulation has enabled unprecedented compute-scalable approaches to robot learning. However, many existing simulation frameworks typically support a narrow range of scenes/tasks and lack features critical for scaling generalizable robotics and sim2real. We introduce and open source ManiSkill3, the fastest state-visual GPU parallelized robotics simulator with contact-rich physics targeting generalizable manipulation. ManiSkill3 supports GPU parallelization of many aspects including simulation+rendering, heterogeneous simulation, pointclouds/voxels visual input, and more. Simulation with rendering on ManiSkill3 can run 10-1000x faster with 2-3x less GPU memory usage than other platforms, achieving up to 30,000+ FPS in benchmarked environments due to minimal python/pytorch overhead in the system, simulation on the GPU, and the use of the SAPIEN parallel rendering system. Tasks that used to take hours to train can now take minutes. We further provide the most comprehensive range of GPU parallelized environments/tasks spanning 12 distinct domains including but not limited to mobile manipulation for tasks such as drawing, humanoids, and dextrous manipulation in realistic scenes designed by artists or real-world digital twins. In addition, millions of demonstration frames are provided from motion planning, RL, and teleoperation. ManiSkill3 also provides a comprehensive set of baselines that span popular RL and learning-from-demonstrations algorithms.

Problem

Research questions and friction points this paper is trying to address.

Enables GPU-parallelized robotics simulation for generalizable manipulation tasks

Addresses limitations of narrow scene/task support in existing simulators

Provides fast, memory-efficient simulation with comprehensive environments/demonstrations

Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU parallelized simulation and rendering

Contact-rich physics for generalizable manipulation

SAPIEN parallel rendering system

🔎 Similar Papers

Omnigrasp: Grasping Diverse Objects with Simulated Humanoids