🤖 AI Summary
To address the slow inference speed and low generation quality in point cloud–driven 3D shape synthesis, this paper proposes the Multi-scale Latent Point Consistency Model (MLPCM). MLPCM establishes a hierarchical representation—from point-level to superpoint-level—within a latent diffusion framework, incorporating 3D spatial attention and multi-scale feature fusion for efficient denoising. Crucially, it introduces consistency distillation—the first such application to 3D point cloud generation—enabling one-step, high-fidelity reconstruction from multi-level superpoint representations to fine-grained point clouds. Evaluated on ShapeNet and ShapeNet-Vol, MLPCM achieves a 100× sampling speedup over standard diffusion samplers while surpassing state-of-the-art diffusion models in both Fréchet Inception Distance (FID) and Jensen–Shannon Divergence (JSD). The method significantly improves generation quality, geometric detail fidelity, and structural diversity.
📝 Abstract
Consistency Models (CMs) have significantly accelerated the sampling process in diffusion models, yielding impressive results in synthesizing high-resolution images. To explore and extend these advancements to point-cloud-based 3D shape generation, we propose a novel Multi-scale Latent Point Consistency Model (MLPCM). Our MLPCM follows a latent diffusion framework and introduces hierarchical levels of latent representations, ranging from point-level to super-point levels, each corresponding to a different spatial resolution. We design a multi-scale latent integration module along with 3D spatial attention to effectively denoise the point-level latent representations conditioned on those from multiple super-point levels. Additionally, we propose a latent consistency model, learned through consistency distillation, that compresses the prior into a one-step generator. This significantly improves sampling efficiency while preserving the performance of the original teacher model. Extensive experiments on standard benchmarks ShapeNet and ShapeNet-Vol demonstrate that MLPCM achieves a 100x speedup in the generation process, while surpassing state-of-the-art diffusion models in terms of both shape quality and diversity.