Representing 3D Shapes With 64 Latent Vectors for 3D Diffusion Models

📅 2025-03-11

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

To address latent representation redundancy and high computational cost in 3D diffusion models, this paper proposes COD-VAE—a two-stage variational autoencoder that compresses 3D point clouds into a mere 64-dimensional 1D latent vector. Methodologically, it integrates triplane decoding, intermediate point cloud block compression, uncertainty-aware modeling, and a dynamic token pruning mechanism to enable adaptive computation allocation within an ultra-low-dimensional latent space. Its core innovation is an uncertainty-guided pruning strategy that jointly optimizes reconstruction fidelity and inference efficiency. Experiments demonstrate that COD-VAE achieves a 16× reduction in latent dimensionality and a 20.8× speedup in generation latency over baseline models, while preserving high-fidelity 3D reconstruction quality. These results validate the feasibility of supporting high-quality generative modeling with a radically minimal latent representation.

Technology Category

Application Category

📝 Abstract

Constructing a compressed latent space through a variational autoencoder (VAE) is the key for efficient 3D diffusion models. This paper introduces COD-VAE, a VAE that encodes 3D shapes into a COmpact set of 1D latent vectors without sacrificing quality. COD-VAE introduces a two-stage autoencoder scheme to improve compression and decoding efficiency. First, our encoder block progressively compresses point clouds into compact latent vectors via intermediate point patches. Second, our triplane-based decoder reconstructs dense triplanes from latent vectors instead of directly decoding neural fields, significantly reducing computational overhead of neural fields decoding. Finally, we propose uncertainty-guided token pruning, which allocates resources adaptively by skipping computations in simpler regions and improves the decoder efficiency. Experimental results demonstrate that COD-VAE achieves 16x compression compared to the baseline while maintaining quality. This enables 20.8x speedup in generation, highlighting that a large number of latent vectors is not a prerequisite for high-quality reconstruction and generation.

Problem

Research questions and friction points this paper is trying to address.

Develops a compact latent space for 3D shapes using COD-VAE.

Introduces a two-stage autoencoder for efficient compression and decoding.

Proposes uncertainty-guided token pruning to enhance decoder efficiency.

Innovation

Methods, ideas, or system contributions that make the work stand out.

COD-VAE encodes 3D shapes into compact 1D latent vectors.

Two-stage autoencoder improves compression and decoding efficiency.

Uncertainty-guided token pruning enhances decoder efficiency adaptively.

🔎 Similar Papers

SC-Diff: 3D Shape Completion with Latent Diffusion Models