๐ค AI Summary
Existing robot simulation frameworks suffer from narrow task coverage and weak physical/visual modeling capabilities, hindering general embodied intelligence development and efficient sim-to-real transfer. To address this, we propose the first full-stack, GPU-accelerated simulation and rendering platform tailored for general embodied intelligence. Our framework introduces a novel tightly integrated architecture unifying a parallel CUDA-based physics engine, high-fidelity rendering (via SAPIEN extension), and multimodal perception (point cloud/voxel). It enables large-scale heterogeneous-scene parallel simulation, artist-grade digital twin environment construction, and integration of million-scale, multi-source demonstration datasets across 12 contact-rich manipulation tasks. Empirical evaluation achieves >30,000 FPSโ10โ1000ร faster than mainstream frameworksโwith 2โ3ร reduced GPU memory consumption and training time compressed from hours to minutes. The platform fully supports both reinforcement learning and imitation learning baselines.
๐ Abstract
Simulation has enabled unprecedented compute-scalable approaches to robot learning. However, many existing simulation frameworks typically support a narrow range of scenes/tasks and lack features critical for scaling generalizable robotics and sim2real. We introduce and open source ManiSkill3, the fastest state-visual GPU parallelized robotics simulator with contact-rich physics targeting generalizable manipulation. ManiSkill3 supports GPU parallelization of many aspects including simulation+rendering, heterogeneous simulation, pointclouds/voxels visual input, and more. Simulation with rendering on ManiSkill3 can run 10-1000x faster with 2-3x less GPU memory usage than other platforms, achieving up to 30,000+ FPS in benchmarked environments due to minimal python/pytorch overhead in the system, simulation on the GPU, and the use of the SAPIEN parallel rendering system. Tasks that used to take hours to train can now take minutes. We further provide the most comprehensive range of GPU parallelized environments/tasks spanning 12 distinct domains including but not limited to mobile manipulation for tasks such as drawing, humanoids, and dextrous manipulation in realistic scenes designed by artists or real-world digital twins. In addition, millions of demonstration frames are provided from motion planning, RL, and teleoperation. ManiSkill3 also provides a comprehensive set of baselines that span popular RL and learning-from-demonstrations algorithms.