VoxScene: Anchor-Conditioned Voxel Diffusion for Indoor Scene Arrangement

📅 2026-05-16

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing data-driven approaches to indoor scene layout often rely on bounding boxes or implicit representations, neglecting voxel-based structures and thereby frequently producing object collisions and structural entanglements, especially in dense scenes. This work proposes an anchor-conditioned voxel diffusion framework that, for the first time, integrates discrete voxel representations with diffusion models to sequentially generate occupancy grids in an explicit, object-centric manner. By leveraging voxel exclusivity to resolve spatial ambiguities and combining prior anchor points with local contextual information, the method achieves collision-free layouts while maintaining physical plausibility and shape diversity. It sets a new state of the art in these criteria and effectively supports the synthesis of high-density complex scenes as well as high-fidelity geometric asset retrieval.

📝 Abstract

We present VoxScene, a novel anchor-conditioned voxel diffusion framework tailored for 3D scene synthesis. Current data-driven layout generation techniques typically rely on bounding proxies or implicit representations, which overlook volumetric structures. This geometric blindness inevitably leads to severe physical collisions and structural entanglement, particularly in densely populated environments. To overcome these limitations, we shift the paradigm to an explicit, object-centric voxel representation. Our pipeline sequentially synthesizes discrete volumetric occupancies conditioned on prior anchors and local context. By exploiting the mutually exclusive nature of discrete voxels, our approach eliminates spatial ambiguities and guarantees collision-free arrangements, even in highly complex environments. Furthermore, the synthesized high-fidelity voxel grids serve as discriminative geometric queries for downstream asset retrieval. Extensive experiments demonstrate the universality of our method, achieving state-of-the-art physical plausibility and unlocking shape diversity compared to existing layout planners.

Problem

Research questions and friction points this paper is trying to address.

3D scene synthesis

physical collisions

volumetric representation

scene arrangement

geometric ambiguity

Innovation

Methods, ideas, or system contributions that make the work stand out.

voxel diffusion

anchor-conditioned

collision-free layout