🤖 AI Summary
This work addresses cross-category 3D shape completion for partial 3D scans. We propose the first 2D-inspired 3D latent diffusion model. Methodologically, we design a 3D latent autoencoder to compress TSDF voxel representations and introduce a dual-path conditioning mechanism: (i) cross-attention fuses multi-view image semantics, and (ii) TSDF spatial feature injection incorporates geometric priors. A multi-scale 3D convolutional decoder enhances fine-grained reconstruction fidelity. Our key contributions are threefold: (1) the first extension of the latent diffusion paradigm to cross-category 3D completion; (2) elimination of category-specific training constraints, enabling zero-shot generalization to unseen categories; and (3) state-of-the-art performance—both in quantitative accuracy and visual realism—on ShapeNet and KITTI benchmarks using a single unified model, with significantly higher output resolution than existing diffusion-based approaches.
📝 Abstract
This paper introduces a 3D shape completion approach using a 3D latent diffusion model optimized for completing shapes, represented as Truncated Signed Distance Functions (TSDFs), from partial 3D scans. Our method combines image-based conditioning through cross-attention and spatial conditioning through the integration of 3D features from captured partial scans. This dual guidance enables high-fidelity, realistic shape completions at superior resolutions. At the core of our approach is the compression of 3D data into a low-dimensional latent space using an auto-encoder inspired by 2D latent diffusion models. This compression facilitates the processing of higher-resolution shapes and allows us to apply our model across multiple object classes, a significant improvement over other existing diffusion-based shape completion methods, which often require a separate diffusion model for each class. We validated our approach against two common benchmarks in the field of shape completion, demonstrating competitive performance in terms of accuracy and realism and performing on par with state-of-the-art methods despite operating at a higher resolution with a single model for all object classes. We present a comprehensive evaluation of our model, showcasing its efficacy in handling diverse shape completion challenges, even on unseen object classes. The code will be released upon acceptance.