GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation

📅 2024-11-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing 3D generative methods suffer from limitations in multi-modal input compatibility, latent space structuring, and geometric-textural disentanglement. This paper introduces the first high-fidelity, interactive 3D generation framework supporting multi-modal conditioning—namely, point clouds, text, and single-view images. Our method integrates a variational autoencoder with multi-modal conditional modeling. Key contributions include: (1) a point-cloud-structured latent space built upon multi-view RGB-D-N rendering, enabling intrinsic 3D-aware editing; and (2) a cascaded latent flow matching architecture that significantly improves shape-texture disentanglement and generation controllability. Evaluated on multiple benchmarks, our framework achieves state-of-the-art performance in both text- and image-driven 3D generation, delivering superior fidelity and editing flexibility. It establishes a scalable, high-quality paradigm for native 3D generation.

Technology Category

Application Category

📝 Abstract
While 3D content generation has advanced significantly, existing methods still face challenges with input formats, latent space design, and output representations. This paper introduces a novel 3D generation framework that addresses these challenges, offering scalable, high-quality 3D generation with an interactive Point Cloud-structured Latent space. Our framework employs a Variational Autoencoder (VAE) with multi-view posed RGB-D(epth)-N(ormal) renderings as input, using a unique latent space design that preserves 3D shape information, and incorporates a cascaded latent flow-based model for improved shape-texture disentanglement. The proposed method, GaussianAnything, supports multi-modal conditional 3D generation, allowing for point cloud, caption, and single image inputs. Notably, the newly proposed latent space naturally enables geometry-texture disentanglement, thus allowing 3D-aware editing. Experimental results demonstrate the effectiveness of our approach on multiple datasets, outperforming existing native 3D methods in both text- and image-conditioned 3D generation.
Problem

Research questions and friction points this paper is trying to address.

Challenges in 3D generation input formats and latent space design
Need for scalable, high-quality 3D object generation
Geometry-texture disentanglement in 3D-aware editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive Point Cloud-structured Latent space
VAE with multi-view RGB-D-N renderings
Cascaded latent flow-based disentanglement model
🔎 Similar Papers
No similar papers found.