VoxScene: Anchor-Conditioned Voxel Diffusion for Indoor Scene Arrangement

📅 2026-05-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

236K/year
🤖 AI Summary
Existing data-driven approaches to indoor scene layout often rely on bounding boxes or implicit representations, neglecting voxel-based structures and thereby frequently producing object collisions and structural entanglements, especially in dense scenes. This work proposes an anchor-conditioned voxel diffusion framework that, for the first time, integrates discrete voxel representations with diffusion models to sequentially generate occupancy grids in an explicit, object-centric manner. By leveraging voxel exclusivity to resolve spatial ambiguities and combining prior anchor points with local contextual information, the method achieves collision-free layouts while maintaining physical plausibility and shape diversity. It sets a new state of the art in these criteria and effectively supports the synthesis of high-density complex scenes as well as high-fidelity geometric asset retrieval.
📝 Abstract
We present VoxScene, a novel anchor-conditioned voxel diffusion framework tailored for 3D scene synthesis. Current data-driven layout generation techniques typically rely on bounding proxies or implicit representations, which overlook volumetric structures. This geometric blindness inevitably leads to severe physical collisions and structural entanglement, particularly in densely populated environments. To overcome these limitations, we shift the paradigm to an explicit, object-centric voxel representation. Our pipeline sequentially synthesizes discrete volumetric occupancies conditioned on prior anchors and local context. By exploiting the mutually exclusive nature of discrete voxels, our approach eliminates spatial ambiguities and guarantees collision-free arrangements, even in highly complex environments. Furthermore, the synthesized high-fidelity voxel grids serve as discriminative geometric queries for downstream asset retrieval. Extensive experiments demonstrate the universality of our method, achieving state-of-the-art physical plausibility and unlocking shape diversity compared to existing layout planners.
Problem

Research questions and friction points this paper is trying to address.

3D scene synthesis
physical collisions
volumetric representation
scene arrangement
geometric ambiguity
Innovation

Methods, ideas, or system contributions that make the work stand out.

voxel diffusion
anchor-conditioned
collision-free layout
explicit 3D representation
scene arrangement
🔎 Similar Papers
H
Haotian Mao
Shanghai Jiao Tong University, China
Yuhan Huang
Yuhan Huang
Harbin Institute of Technology
transfer learning diagnostic methods for sparse feature
J
Jiatao Lin
Shanghai Jiao Tong University, China
Yang Zhao
Yang Zhao
Research Professor, Zhejiang University, China
Intelligent BuildingSmart GridFault detection and diagnosisEnergy efficiency
H
Hui Wang
Shanghai Jiao Tong University, China
Y
Yiheng Zhang
Hong Kong University of Science and Technology, China
Yuwang Wang
Yuwang Wang
Tsinghua University
Deep learningRepresentation Learning3D Vision
Chenliang Zhou
Chenliang Zhou
University of Cambridge
machine learninggenerative artificial intelligencecomputer visioncomputer graphics
Y
Yan Zhang
Shanghai Jiao Tong University, China
F
Fangcheng Zhong
Peking University, China
X
Xubo Yang
Shanghai Jiao Tong University, China