🤖 AI Summary
This work addresses three key challenges in single-image 3D object reconstruction: occlusion handling, poor geometric consistency, and low-detail fidelity. We propose a two-stage framework: first, a lightweight point diffusion model generates sparse yet robust 3D point clouds; second, an image-point cloud joint feature fusion module coupled with an implicit surface reconstruction network produces high-fidelity, geometrically stable meshes. Our core contribution is the novel integration of point clouds as an editable intermediate representation within single-image reconstruction—enabling cross-modal alignment and real-time user interaction while preserving probabilistic modeling capabilities. Evaluated on multiple benchmarks, our method surpasses state-of-the-art approaches, achieving inference in only 0.7 seconds. It supports user-driven point cloud editing and delivers high-quality mesh outputs, demonstrating both computational efficiency and expressive geometric modeling.
📝 Abstract
We study the problem of single-image 3D object reconstruction. Recent works have diverged into two directions: regression-based modeling and generative modeling. Regression methods efficiently infer visible surfaces, but struggle with occluded regions. Generative methods handle uncertain regions better by modeling distributions, but are computationally expensive and the generation is often misaligned with visible surfaces. In this paper, we present SPAR3D, a novel two-stage approach aiming to take the best of both directions. The first stage of SPAR3D generates sparse 3D point clouds using a lightweight point diffusion model, which has a fast sampling speed. The second stage uses both the sampled point cloud and the input image to create highly detailed meshes. Our two-stage design enables probabilistic modeling of the ill-posed single-image 3D task while maintaining high computational efficiency and great output fidelity. Using point clouds as an intermediate representation further allows for interactive user edits. Evaluated on diverse datasets, SPAR3D demonstrates superior performance over previous state-of-the-art methods, at an inference speed of 0.7 seconds. Project page with code and model: https://spar3d.github.io