MetaScenes: Towards Automated Replica Creation for Real-world 3D Scans

📅 2025-05-05
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Embodied AI research suffers from manual, non-scalable, and low-generalization 3D scene construction. Method: This paper proposes Scan2Sim—a fully automated paradigm for generating interactive 3D scenes from real-world scans. We introduce MetaScenes, the first large-scale, interactive 3D scene benchmark derived from real scans, containing 15,366 objects across 831 fine-grained categories. We design Scan2Sim, a multimodal alignment model integrating CLIP and point-cloud encoders to achieve high-fidelity, semantically consistent replacement of real scans with simulation-ready assets. Our approach further incorporates joint geometric-semantic modeling, differentiable scene synthesis, and physics-aware optimization. Results: Experiments demonstrate substantial improvements in cross-domain transfer and sim-to-real generalization for robotic manipulation and vision-language navigation tasks, consistently outperforming manually constructed scene baselines across multiple benchmarks.

Technology Category

Application Category

📝 Abstract
Embodied AI (EAI) research requires high-quality, diverse 3D scenes to effectively support skill acquisition, sim-to-real transfer, and generalization. Achieving these quality standards, however, necessitates the precise replication of real-world object diversity. Existing datasets demonstrate that this process heavily relies on artist-driven designs, which demand substantial human effort and present significant scalability challenges. To scalably produce realistic and interactive 3D scenes, we first present MetaScenes, a large-scale, simulatable 3D scene dataset constructed from real-world scans, which includes 15366 objects spanning 831 fine-grained categories. Then, we introduce Scan2Sim, a robust multi-modal alignment model, which enables the automated, high-quality replacement of assets, thereby eliminating the reliance on artist-driven designs for scaling 3D scenes. We further propose two benchmarks to evaluate MetaScenes: a detailed scene synthesis task focused on small item layouts for robotic manipulation and a domain transfer task in vision-and-language navigation (VLN) to validate cross-domain transfer. Results confirm MetaScene's potential to enhance EAI by supporting more generalizable agent learning and sim-to-real applications, introducing new possibilities for EAI research. Project website: https://meta-scenes.github.io/.
Problem

Research questions and friction points this paper is trying to address.

Automating high-quality 3D scene replication for Embodied AI
Reducing reliance on artist-driven designs for scalability
Enhancing sim-to-real transfer and generalization in 3D environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale simulatable 3D scene dataset
Automated asset replacement via Scan2Sim
Multi-modal alignment for realistic replication
🔎 Similar Papers
No similar papers found.
Huangyue Yu
Huangyue Yu
Beijing Institute for General Artificial Intelligence
computer vision and artificial intelligence
Baoxiong Jia
Baoxiong Jia
Ph.D. in Computer Science, UCLA
Computer VisionArtificial Intelligence
Y
Yixin Chen
State Key Laboratory of General Artificial Intelligence, BIGAI
Yandan Yang
Yandan Yang
BIGAI (Beijing Institute for General Artificial Intelligence)
Computer VisionGenerationEmbodied AI
Puhao Li
Puhao Li
Ph.D. Student, Tsinghua University
Computer VisionRoboticsMachine Learning
Rongpeng Su
Rongpeng Su
BIGAI
Embodied AI
J
Jiaxin Li
State Key Laboratory of General Artificial Intelligence, BIGAI, Beijing Institute of Technology
Q
Qing Li
State Key Laboratory of General Artificial Intelligence, BIGAI
W
Wei Liang
Beijing Institute of Technology
S
Song-Chun Zhu
State Key Laboratory of General Artificial Intelligence, BIGAI
Tengyu Liu
Tengyu Liu
Beijing Institute for General Artificial Intelligence
computer visionhuman object interactionhuman motion generationgrasping
S
Siyuan Huang
State Key Laboratory of General Artificial Intelligence, BIGAI