🤖 AI Summary
Existing open-source articulable object datasets suffer from significant deficiencies in visual photorealism and physical fidelity, severely limiting their practical utility in robot learning. To address this, we introduce the first high-fidelity digital twin dataset of articulable objects specifically designed for robot learning, covering representative indoor scenes while jointly ensuring visual realism, physical accuracy, and modular interactivity. Our approach innovatively incorporates embedded modular interaction behavior modeling and pixel-level functional region annotation; leverages USD for unified asset encapsulation; and integrates optical motion capture validation, PBR-based high-resolution texturing, and fine-grained rigid-body dynamics parameter calibration. Experiments demonstrate substantial improvements in Sim2Real transfer performance, with consistent gains across both imitation learning and reinforcement learning benchmarks. The complete dataset—including all assets, annotations, and a comprehensive production pipeline—is publicly released under an open-source license.
📝 Abstract
Robot learning increasingly relies on simulation to advance complex ability such as dexterous manipulations and precise interactions, necessitating high-quality digital assets to bridge the sim-to-real gap. However, existing open-source articulated-object datasets for simulation are limited by insufficient visual realism and low physical fidelity, which hinder their utility for training models mastering robotic tasks in real world. To address these challenges, we introduce ArtVIP, a comprehensive open-source dataset comprising high-quality digital-twin articulated objects, accompanied by indoor-scene assets. Crafted by professional 3D modelers adhering to unified standards, ArtVIP ensures visual realism through precise geometric meshes and high-resolution textures, while physical fidelity is achieved via fine-tuned dynamic parameters. Meanwhile, the dataset pioneers embedded modular interaction behaviors within assets and pixel-level affordance annotations. Feature-map visualization and optical motion capture are employed to quantitatively demonstrate ArtVIP's visual and physical fidelity, with its applicability validated across imitation learning and reinforcement learning experiments. Provided in USD format with detailed production guidelines, ArtVIP is fully open-source, benefiting the research community and advancing robot learning research. Our project is at https://x-humanoid-artvip.github.io/ .