MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data

📅 2024-12-18

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the scarcity of real-world data and high annotation costs in 3D scene reconstruction. We propose a purely geometry-driven procedural synthesis paradigm to generate MegaSynth—a massive, semantics-free 3D scene dataset comprising 700K diverse, controllable, and scalable scenes—constructed solely from geometric primitives and spatial structural rules, without semantic modeling. Crucially, we demonstrate for the first time that low-level geometric reconstruction does not require semantic priors, and that synthetic data alone can effectively substitute real data for training. Leveraging MegaSynth, we pretrain and jointly train Large Reconstruction Models (LRMs) exclusively on synthetic data, achieving PSNR gains of 1.2–1.8 dB over baselines. The resulting models match or exceed the performance of those trained on real data, while exhibiting markedly improved training stability and cross-domain generalization.

Technology Category

Application Category

📝 Abstract

We propose scaling up 3D scene reconstruction by training with synthesized data. At the core of our work is MegaSynth, a procedurally generated 3D dataset comprising 700K scenes - over 50 times larger than the prior real dataset DL3DV - dramatically scaling the training data. To enable scalable data generation, our key idea is eliminating semantic information, removing the need to model complex semantic priors such as object affordances and scene composition. Instead, we model scenes with basic spatial structures and geometry primitives, offering scalability. Besides, we control data complexity to facilitate training while loosely aligning it with real-world data distribution to benefit real-world generalization. We explore training LRMs with both MegaSynth and available real data. Experiment results show that joint training or pre-training with MegaSynth improves reconstruction quality by 1.2 to 1.8 dB PSNR across diverse image domains. Moreover, models trained solely on MegaSynth perform comparably to those trained on real data, underscoring the low-level nature of 3D reconstruction. Additionally, we provide an in-depth analysis of MegaSynth's properties for enhancing model capability, training stability, and generalization.

Problem

Research questions and friction points this paper is trying to address.

Scaling up 3D scene reconstruction using synthesized data.

Eliminating semantic information to enable scalable data generation.

Improving reconstruction quality with joint training on synthesized and real data.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthesized 3D dataset scaling

Eliminating semantic information requirement

Basic spatial structures modeling

🔎 Similar Papers

Generalizable 3D Scene Reconstruction via Divide and Conquer from a Single View

2024-04-04arXiv.orgCitations: 3

World Labs

$250,000-$350,000 base salary (good-faith estimate for San Francisco Bay Area upon hire; actual offer based on experience, skills, and qualifications)

San Francisco / San Francisco Office, San Francisco, California, United States

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)