SPREAD: Spatial-Physical REasoning via geometry Aware Diffusion

📅 2026-03-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing data-driven approaches struggle to generate 3D scenes that simultaneously exhibit realistic complexity and physical plausibility due to the absence of structured training data and explicit physical modeling. This work proposes a geometry-aware diffusion model that jointly captures spatial layouts and physical relationships through a graph Transformer, conditioned on pose-augmented scene point clouds to guide the generation process. Furthermore, a differentiable physics-guided mechanism is introduced to enforce collision-free arrangements, semantic relationship constraints, and gravity consistency. By uniquely integrating geometry-aware diffusion with differentiable physics-based optimization, the method achieves state-of-the-art performance in both spatial-relational reasoning and physical plausibility metrics on 3D-FRONT and ProcTHOR, producing scenes that demonstrate remarkable stability and consistency before and after simulation.
📝 Abstract
Automated 3D scene generation is pivotal for applications spanning virtual reality, digital content creation, and Embodied AI. While computer graphics prioritizes aesthetic layouts, vision and robotics demand scenes that mirror real-world complexity which current data-driven methods struggle to achieve due to limited unstructured training data and insufficient spatial and physical modeling. We propose SPREAD, a diffusion-based framework that jointly learns spatial and physical relationships through a graph transformer, explicitly conditioning on posed scene point clouds for geometric awareness. Moreover, our model integrates differentiable guidance for collision avoidance, relational constraint, and gravity, ensuring physically coherent scenes without sacrificing relational context. Our experiments on 3D-FRONT and ProcTHOR datasets demonstrate state-of-the-art performance in spatial-relational reasoning and physical metrics. Moreover, \ours{} outperforms baselines in scene consistency and stability during pre- and post-physics simulation, proving its capability to generate simulation-ready environments for embodied AI agents.
Problem

Research questions and friction points this paper is trying to address.

3D scene generation
spatial reasoning
physical plausibility
Embodied AI
scene realism
Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion model
spatial-physical reasoning
geometric awareness
differentiable physics guidance
3D scene generation
🔎 Similar Papers
No similar papers found.
M
Minzhang Li
ShanghaiTech University
K
Kuixiang Shao
ShanghaiTech University
X
Xuebing Li
ShanghaiTech University
Y
Yuyang Jiao
ShanghaiTech University
Y
Yinuo Bai
ShanghaiTech University
H
Hengan Zhou
ShanghaiTech University
S
Sixian Shen
ShanghaiTech University
Jiayuan Gu
Jiayuan Gu
Assistant Professor, ShanghaiTech University
Embodied AI3D Vision
Jingyi Yu
Jingyi Yu
Professor, ShanghaiTech University
Computer VisionComputer Graphics