🤖 AI Summary
Existing 3D generation methods rely on continuous diffusion and post-hoc thresholding for sparse voxel modeling, which hinders efficient editing and uncertainty quantification. This work proposes DVD, a discrete voxel diffusion framework that, for the first time, directly applies discrete diffusion to model 3D sparse voxel priors by treating voxel occupancy as a native discrete variable. DVD introduces predictive entropy as a principled measure of uncertainty and incorporates a lightweight block-wise perturbation fine-tuning strategy to enable single-step, efficient editing. Without incurring additional computational overhead, the method substantially improves generation quality and interpretability while supporting sample-level quality assessment and data filtering.
📝 Abstract
We introduce Discrete Voxel Diffusion (DVD), a discrete diffusion framework to generate, assess, and edit sparse voxels for SLat (Structured LATent) based 3D generative pipelines. Although discrete diffusion has not generally displaced continuous diffusion in image-like generation, we show that it can be an effective first-stage prior for sparse voxel scaffolds. By treating voxel occupancy as a native discrete variable, DVD avoids continuous-to-discrete thresholding and provides a simple framework for voxel generation, uncertainty estimation, and editing. Beyond quality gains, DVD provides more interpretable generation dynamics through explicit categorical modeling. Furthermore, we leverage the predictive entropy as a robust uncertainty metric to identify ambiguous voxel regions and complicated samples, facilitating tasks such as data filtering and quality assessment. Finally, we propose a lightweight fine-tuning strategy using block-structured perturbation patterns. This approach empowers the model to inpaint and edit voxels within a single sampling round, requiring negligible auxiliary computation and no additional model evaluations.