🤖 AI Summary
Single-view image-based high-fidelity 3D scene reconstruction remains challenging due to the limitations of coarse-grained representations and the reliance of existing methods on large-scale scene datasets. This work proposes a decomposition-based reconstruction strategy that first leverages object-level diffusion models to generate geometry and appearance for individual objects from a single input image, followed by a scene-level integration and refinement stage guided by differentiable rendering and diffusion priors. Notably, the method operates without requiring large-scale scene training data and achieves significant improvements over current approaches in both geometric accuracy and novel view synthesis quality, thereby effectively supporting downstream applications such as interior design.
📝 Abstract
In this paper, we introduce \textit{DecoRec}, a novel system designed to elevate single-view 2D images to a decomposed 3D scene mesh. Current methods for single-view scene reconstruction typically rely on object retrieval or the regression of coarse 3D voxels or surfaces, leading to inaccuracies in capturing the appearance and geometry of the input image. The lack of high-quality large-scale scene-level datasets further complicates direct 3D scene generation from single-view images. To achieve high-quality 3D scene generation from a single-view image, DecoRec takes advantage of recent diffusion-based single-view object reconstruction methods to reconstruct individual objects separately. Subsequently, a refinement pipeline is proposed to effectively merge these reconstructed objects, enhancing appearance and geometry through a differentiable rendering technique and diffusion-guided refinement. Our results demonstrate that DecoRec facilitates high-quality single-view scene reconstruction in both geometry and novel synthesis, offering significant benefits for downstream applications like room interior design.