🤖 AI Summary
Existing single-image 3D scene reconstruction methods struggle to produce editable, physically consistent textured meshes—suffering from erroneous object decomposition, inaccurate spatial relationships, and missing backgrounds—thus failing industrial requirements in film and game production. This paper introduces the first end-to-end framework for editable 3D asset generation: it enforces geometric plausibility via a novel 4-DoF differentiable ground-plane constraint; models occlusion recovery as a generative image editing task; and achieves, for the first time, background-driven spatially consistent reconstruction, yielding illumination-coherent, simulation-ready, fully textured meshes. The method synergistically integrates state-of-the-art modules—including object detection, monocular depth estimation, NeRF/3D Gaussian Splatting reconstruction, diffusion-based generation, and differentiable geometric optimization—across complementary domains. It establishes new state-of-the-art performance on single-image 3D scene reconstruction, producing structurally sound, texture-accurate, physically plausible, and post-editable 3D scenes compatible with standard pipelines.
📝 Abstract
Recent advances in 3D scene generation produce visually appealing output, but current representations hinder artists' workflows that require modifiable 3D textured mesh scenes for visual effects and game development. Despite significant advances, current textured mesh scene reconstruction methods are far from artist ready, suffering from incorrect object decomposition, inaccurate spatial relationships, and missing backgrounds. We present 3D-RE-GEN, a compositional framework that reconstructs a single image into textured 3D objects and a background. We show that combining state of the art models from specific domains achieves state of the art scene reconstruction performance, addressing artists' requirements.
Our reconstruction pipeline integrates models for asset detection, reconstruction, and placement, pushing certain models beyond their originally intended domains. Obtaining occluded objects is treated as an image editing task with generative models to infer and reconstruct with scene level reasoning under consistent lighting and geometry. Unlike current methods, 3D-RE-GEN generates a comprehensive background that spatially constrains objects during optimization and provides a foundation for realistic lighting and simulation tasks in visual effects and games. To obtain physically realistic layouts, we employ a novel 4-DoF differentiable optimization that aligns reconstructed objects with the estimated ground plane. 3D-RE-GEN~achieves state of the art performance in single image 3D scene reconstruction, producing coherent, modifiable scenes through compositional generation guided by precise camera recovery and spatial optimization.