LATTICE: Democratize High-Fidelity 3D Generation at Scale

📅 2025-11-24

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

3D generative models face fundamental trade-offs between geometric fidelity, computational efficiency, and scalability due to the difficulty of modeling complex spatial structures and the high cost of volumetric representation. To address this, we propose VoxSet—a semi-structured voxel representation that explicitly encodes spatial topology under high compression, enabling position-aware generation and token-level test-time scaling. Our two-stage framework first generates sparse voxel anchors, then refines geometry to arbitrary resolution via a correction-flow Transformer. This unifies structured 3D modeling with efficient decoding, drastically reducing training overhead. Evaluated across multiple benchmarks, VoxSet achieves state-of-the-art performance in fidelity, diversity, and scalability metrics. It enables high-fidelity, large-scale, and flexible inference for 3D asset generation—supporting resolution-agnostic output and interactive editing without retraining.

Technology Category

Application Category

📝 Abstract

We present LATTICE, a new framework for high-fidelity 3D asset generation that bridges the quality and scalability gap between 3D and 2D generative models. While 2D image synthesis benefits from fixed spatial grids and well-established transformer architectures, 3D generation remains fundamentally more challenging due to the need to predict both spatial structure and detailed geometric surfaces from scratch. These challenges are exacerbated by the computational complexity of existing 3D representations and the lack of structured and scalable 3D asset encoding schemes. To address this, we propose VoxSet, a semi-structured representation that compresses 3D assets into a compact set of latent vectors anchored to a coarse voxel grid, enabling efficient and position-aware generation. VoxSet retains the simplicity and compression advantages of prior VecSet methods while introducing explicit structure into the latent space, allowing positional embeddings to guide generation and enabling strong token-level test-time scaling. Built upon this representation, LATTICE adopts a two-stage pipeline: first generating a sparse voxelized geometry anchor, then producing detailed geometry using a rectified flow transformer. Our method is simple at its core, but supports arbitrary resolution decoding, low-cost training, and flexible inference schemes, achieving state-of-the-art performance on various aspects, and offering a significant step toward scalable, high-quality 3D asset creation.

Problem

Research questions and friction points this paper is trying to address.

Bridging quality and scalability gaps in 3D generation

Addressing computational complexity in 3D asset encoding

Enabling efficient and structured high-fidelity 3D creation

Innovation

Methods, ideas, or system contributions that make the work stand out.

VoxSet representation compresses 3D assets into latent vectors

Two-stage pipeline generates sparse geometry then detailed surfaces

Rectified flow transformer enables efficient high-fidelity 3D generation

🔎 Similar Papers

No similar papers found.