Evaluating Latent Generative Paradigms for High-Fidelity 3D Shape Completion from a Single Depth Image

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This work addresses high-fidelity 3D shape completion from a single noisy depth image. We systematically compare denoising diffusion probabilistic models (DDPMs) and autoregressive causal Transformers for generative modeling in this setting. Our key insight is that latent-space discreteness critically governs model performance: DDPMs excel at multimodal completion in continuous latent spaces, whereas autoregressive Transformers match or surpass DDPMs when operating in a unified discrete latent space. Through a discriminative baseline and rigorous ablation studies, we provide the first empirical evidence that the superiority of a generative paradigm depends more on latent-space design than on architectural choice per se. On real-world single-depth-image completion, our approach achieves state-of-the-art performance. This work establishes theoretical and practical guidance for architecture selection and latent-space design in 3D generative modeling.

Technology Category

Application Category

📝 Abstract

While generative models have seen significant adoption across a wide range of data modalities, including 3D data, a consensus on which model is best suited for which task has yet to be reached. Further, conditional information such as text and images to steer the generation process are frequently employed, whereas others, like partial 3D data, have not been thoroughly evaluated. In this work, we compare two of the most promising generative models--Denoising Diffusion Probabilistic Models and Autoregressive Causal Transformers--which we adapt for the tasks of generative shape modeling and completion. We conduct a thorough quantitative evaluation and comparison of both tasks, including a baseline discriminative model and an extensive ablation study. Our results show that (1) the diffusion model with continuous latents outperforms both the discriminative model and the autoregressive approach and delivers state-of-the-art performance on multi-modal shape completion from a single, noisy depth image under realistic conditions and (2) when compared on the same discrete latent space, the autoregressive model can match or exceed diffusion performance on these tasks.

Problem

Research questions and friction points this paper is trying to address.

Evaluating generative models for 3D shape completion

Comparing diffusion and autoregressive models on shape tasks

Completing 3D shapes from single noisy depth images

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion model with continuous latents for shape completion

Autoregressive model matches diffusion in discrete space

Generative shape completion from single noisy depth image

🔎 Similar Papers

SC-Diff: 3D Shape Completion with Latent Diffusion Models