BridgeShape: Latent Diffusion Schrödinger Bridge for 3D Shape Completion

📅 2025-06-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based 3D shape completion methods rely on conditional denoising paradigms, which struggle to model the global optimal transport path from incomplete to complete shapes; moreover, voxel-space diffusion is resolution-limited, hindering fine-grained geometric reconstruction. To address these limitations, we propose BridgeShape—the first method to introduce Schrödinger bridge theory into 3D shape completion, explicitly modeling the optimal transport path between distributions in a compact latent space. We design a depth-enhanced VQ-VAE that fuses multi-view depth maps with DINOv2 semantic features to improve geometric structure awareness, and perform efficient latent-space diffusion optimization. Experiments on large-scale benchmarks demonstrate that BridgeShape achieves state-of-the-art performance, significantly improving high-resolution reconstruction quality and generalization fidelity to unseen categories.

Technology Category

Application Category

📝 Abstract
Existing diffusion-based 3D shape completion methods typically use a conditional paradigm, injecting incomplete shape information into the denoising network via deep feature interactions (e.g., concatenation, cross-attention) to guide sampling toward complete shapes, often represented by voxel-based distance functions. However, these approaches fail to explicitly model the optimal global transport path, leading to suboptimal completions. Moreover, performing diffusion directly in voxel space imposes resolution constraints, limiting the generation of fine-grained geometric details. To address these challenges, we propose BridgeShape, a novel framework for 3D shape completion via latent diffusion Schrödinger bridge. The key innovations lie in two aspects: (i) BridgeShape formulates shape completion as an optimal transport problem, explicitly modeling the transition between incomplete and complete shapes to ensure a globally coherent transformation. (ii) We introduce a Depth-Enhanced Vector Quantized Variational Autoencoder (VQ-VAE) to encode 3D shapes into a compact latent space, leveraging self-projected multi-view depth information enriched with strong DINOv2 features to enhance geometric structural perception. By operating in a compact yet structurally informative latent space, BridgeShape effectively mitigates resolution constraints and enables more efficient and high-fidelity 3D shape completion. BridgeShape achieves state-of-the-art performance on large-scale 3D shape completion benchmarks, demonstrating superior fidelity at higher resolutions and for unseen object classes.
Problem

Research questions and friction points this paper is trying to address.

Explicitly models optimal global transport path for 3D completion
Overcomes voxel resolution limits via compact latent space
Enhances geometric details with depth-augmented VQ-VAE encoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Latent diffusion Schrodinger bridge for optimal transport
Depth-Enhanced VQ-VAE for compact latent encoding
Multi-view depth with DINOv2 for geometric perception
🔎 Similar Papers
No similar papers found.