Native and Compact Structured Latents for 3D Generation

๐Ÿ“… 2025-12-16
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing 3D generation methods struggle to model complex topologies (e.g., non-manifold geometry, open surfaces) and high-fidelity appearance, limiting realism and generalization. This paper proposes a native-3D-driven structured implicit representation framework. First, we introduce O-Voxelโ€”a novel sparse voxel structure inherently supporting arbitrary topologies. Second, we design a Sparse Compression VAE that jointly models geometry, material, and rendering properties while achieving high spatial compression and a compact latent space. Third, we integrate a 4B-parameter flow-matching model to enable end-to-end training on native 3D data. Experiments demonstrate significant improvements over state-of-the-art methods in geometric accuracy and material fidelity, with efficient inference and scalable generation of diverse, high-quality 3D assets.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advancements in 3D generative modeling have significantly improved the generation realism, yet the field is still hampered by existing representations, which struggle to capture assets with complex topologies and detailed appearance. This paper present an approach for learning a structured latent representation from native 3D data to address this challenge. At its core is a new sparse voxel structure called O-Voxel, an omni-voxel representation that encodes both geometry and appearance. O-Voxel can robustly model arbitrary topology, including open, non-manifold, and fully-enclosed surfaces, while capturing comprehensive surface attributes beyond texture color, such as physically-based rendering parameters. Based on O-Voxel, we design a Sparse Compression VAE which provides a high spatial compression rate and a compact latent space. We train large-scale flow-matching models comprising 4B parameters for 3D generation using diverse public 3D asset datasets. Despite their scale, inference remains highly efficient. Meanwhile, the geometry and material quality of our generated assets far exceed those of existing models. We believe our approach offers a significant advancement in 3D generative modeling.
Problem

Research questions and friction points this paper is trying to address.

Develops a structured latent representation for 3D generation
Introduces O-Voxel to model complex topology and appearance
Creates a compact latent space for efficient 3D asset generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

O-Voxel sparse voxel structure for geometry and appearance
Sparse Compression VAE for compact latent space
Large-scale flow-matching models with efficient inference
๐Ÿ”Ž Similar Papers
2024-03-18European Conference on Computer VisionCitations: 70