ManiSkill3: GPU Parallelized Robotics Simulation and Rendering for Generalizable Embodied AI

๐Ÿ“… 2024-10-01
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 33
โœจ Influential: 3
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing robot simulation frameworks suffer from narrow task coverage and weak physical/visual modeling capabilities, hindering general embodied intelligence development and efficient sim-to-real transfer. To address this, we propose the first full-stack, GPU-accelerated simulation and rendering platform tailored for general embodied intelligence. Our framework introduces a novel tightly integrated architecture unifying a parallel CUDA-based physics engine, high-fidelity rendering (via SAPIEN extension), and multimodal perception (point cloud/voxel). It enables large-scale heterogeneous-scene parallel simulation, artist-grade digital twin environment construction, and integration of million-scale, multi-source demonstration datasets across 12 contact-rich manipulation tasks. Empirical evaluation achieves >30,000 FPSโ€”10โ€“1000ร— faster than mainstream frameworksโ€”with 2โ€“3ร— reduced GPU memory consumption and training time compressed from hours to minutes. The platform fully supports both reinforcement learning and imitation learning baselines.

Technology Category

Application Category

๐Ÿ“ Abstract
Simulation has enabled unprecedented compute-scalable approaches to robot learning. However, many existing simulation frameworks typically support a narrow range of scenes/tasks and lack features critical for scaling generalizable robotics and sim2real. We introduce and open source ManiSkill3, the fastest state-visual GPU parallelized robotics simulator with contact-rich physics targeting generalizable manipulation. ManiSkill3 supports GPU parallelization of many aspects including simulation+rendering, heterogeneous simulation, pointclouds/voxels visual input, and more. Simulation with rendering on ManiSkill3 can run 10-1000x faster with 2-3x less GPU memory usage than other platforms, achieving up to 30,000+ FPS in benchmarked environments due to minimal python/pytorch overhead in the system, simulation on the GPU, and the use of the SAPIEN parallel rendering system. Tasks that used to take hours to train can now take minutes. We further provide the most comprehensive range of GPU parallelized environments/tasks spanning 12 distinct domains including but not limited to mobile manipulation for tasks such as drawing, humanoids, and dextrous manipulation in realistic scenes designed by artists or real-world digital twins. In addition, millions of demonstration frames are provided from motion planning, RL, and teleoperation. ManiSkill3 also provides a comprehensive set of baselines that span popular RL and learning-from-demonstrations algorithms.
Problem

Research questions and friction points this paper is trying to address.

Enables GPU-parallelized robotics simulation for generalizable manipulation tasks
Addresses limitations of narrow scene/task support in existing simulators
Provides fast, memory-efficient simulation with comprehensive environments/demonstrations
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPU parallelized simulation and rendering
Contact-rich physics for generalizable manipulation
SAPIEN parallel rendering system
๐Ÿ”Ž Similar Papers
2024-07-16Neural Information Processing SystemsCitations: 16
Stone Tao
Stone Tao
University of California - San Diego
Reinforcement LearningDeep LearningMachine LearningComputer Vision
Fanbo Xiang
Fanbo Xiang
University of California San Diego, Hillbot
Arth Shukla
Arth Shukla
CS PhD Student, University of California - San Diego
Robot learningsimulationmanipulationvision
Y
Yuzhe Qin
University of California San Diego
X
Xander Hinrichsen
University of California San Diego
X
Xiaodi Yuan
University of California San Diego, Hillbot
Chen Bao
Chen Bao
Carnegie Mellon University
X
Xinsong Lin
University of California San Diego
Y
Yulin Liu
University of California San Diego, Hillbot
T
Tse-kai Chan
University of California San Diego
Y
Yuan Gao
University of California San Diego
Xuanlin Li
Xuanlin Li
Unknown affiliation
Computer VisionRoboticsEmbodied AINatural Language ProcessingMachine Learning
Tongzhou Mu
Tongzhou Mu
UC San Diego
Embodied AIReinforcement LearningRoboticsDecision Making
N
Nan Xiao
University of California San Diego
A
Arnav Gurha
University of California San Diego
Z
Zhiao Huang
University of California San Diego, Hillbot
R
Roberto Calandra
TU Dresden
R
Rui Chen
Tsinghua University
Shan Luo
Shan Luo
Reader (Associate Professor), King's College London
RoboticsRobot PerceptionTactile SensingComputer VisionMachine Learning
H
Hao Su
University of California San Diego, Hillbot