SC-Diff: 3D Shape Completion with Latent Diffusion Models

📅 2024-03-19

🏛️ arXiv.org

📈 Citations: 5

✨ Influential: 0

career value

189K/year

🤖 AI Summary

This work addresses cross-category 3D shape completion for partial 3D scans. We propose the first 2D-inspired 3D latent diffusion model. Methodologically, we design a 3D latent autoencoder to compress TSDF voxel representations and introduce a dual-path conditioning mechanism: (i) cross-attention fuses multi-view image semantics, and (ii) TSDF spatial feature injection incorporates geometric priors. A multi-scale 3D convolutional decoder enhances fine-grained reconstruction fidelity. Our key contributions are threefold: (1) the first extension of the latent diffusion paradigm to cross-category 3D completion; (2) elimination of category-specific training constraints, enabling zero-shot generalization to unseen categories; and (3) state-of-the-art performance—both in quantitative accuracy and visual realism—on ShapeNet and KITTI benchmarks using a single unified model, with significantly higher output resolution than existing diffusion-based approaches.

Technology Category

Application Category

📝 Abstract

This paper introduces a 3D shape completion approach using a 3D latent diffusion model optimized for completing shapes, represented as Truncated Signed Distance Functions (TSDFs), from partial 3D scans. Our method combines image-based conditioning through cross-attention and spatial conditioning through the integration of 3D features from captured partial scans. This dual guidance enables high-fidelity, realistic shape completions at superior resolutions. At the core of our approach is the compression of 3D data into a low-dimensional latent space using an auto-encoder inspired by 2D latent diffusion models. This compression facilitates the processing of higher-resolution shapes and allows us to apply our model across multiple object classes, a significant improvement over other existing diffusion-based shape completion methods, which often require a separate diffusion model for each class. We validated our approach against two common benchmarks in the field of shape completion, demonstrating competitive performance in terms of accuracy and realism and performing on par with state-of-the-art methods despite operating at a higher resolution with a single model for all object classes. We present a comprehensive evaluation of our model, showcasing its efficacy in handling diverse shape completion challenges, even on unseen object classes. The code will be released upon acceptance.

Problem

Research questions and friction points this paper is trying to address.

Completing 3D shapes from partial scans and images

Reducing GPU memory usage in high-resolution processing

Integrating multimodal 2D and 3D information consistently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent diffusion model for 3D shape completion

Multimodal conditioning with 2D images and 3D scans

Discrete latent space with joint 2D-3D supervision

🔎 Similar Papers

LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation