🤖 AI Summary
Existing data-driven approaches to indoor scene layout often rely on bounding boxes or implicit representations, neglecting voxel-based structures and thereby frequently producing object collisions and structural entanglements, especially in dense scenes. This work proposes an anchor-conditioned voxel diffusion framework that, for the first time, integrates discrete voxel representations with diffusion models to sequentially generate occupancy grids in an explicit, object-centric manner. By leveraging voxel exclusivity to resolve spatial ambiguities and combining prior anchor points with local contextual information, the method achieves collision-free layouts while maintaining physical plausibility and shape diversity. It sets a new state of the art in these criteria and effectively supports the synthesis of high-density complex scenes as well as high-fidelity geometric asset retrieval.
📝 Abstract
We present VoxScene, a novel anchor-conditioned voxel diffusion framework tailored for 3D scene synthesis. Current data-driven layout generation techniques typically rely on bounding proxies or implicit representations, which overlook volumetric structures. This geometric blindness inevitably leads to severe physical collisions and structural entanglement, particularly in densely populated environments. To overcome these limitations, we shift the paradigm to an explicit, object-centric voxel representation. Our pipeline sequentially synthesizes discrete volumetric occupancies conditioned on prior anchors and local context. By exploiting the mutually exclusive nature of discrete voxels, our approach eliminates spatial ambiguities and guarantees collision-free arrangements, even in highly complex environments. Furthermore, the synthesized high-fidelity voxel grids serve as discriminative geometric queries for downstream asset retrieval. Extensive experiments demonstrate the universality of our method, achieving state-of-the-art physical plausibility and unlocking shape diversity compared to existing layout planners.