Learning 3D Object Spatial Relationships from Pre-trained 2D Diffusion Models

📅 2025-03-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of extracting implicit geometric priors from pretrained 2D diffusion models to model and generate physically plausible 3D object–object spatial relationships (OORs), particularly enabling collision-free multi-object layout. Methodologically, it first extracts high-confidence 2D spatial relations between object pairs from diffusion-generated images and lifts them into 3D to construct a large-scale, diverse OOR dataset. It then introduces a pairwise-to-multi-object consistency modeling framework that integrates score-based diffusion models with multi-object spatial constraint optimization, enabling scalable, collision-free 3D layout generation—from two objects to complex scenes. Experiments demonstrate strong robustness across diverse OOR types, significantly improving layout plausibility and quality in realistic scenes. The approach supports open-vocabulary object combinations and zero-shot generalization, advancing controllable, physics-aware 3D scene synthesis from 2D foundation models.

Technology Category

Application Category

📝 Abstract
We present a method for learning 3D spatial relationships between object pairs, referred to as object-object spatial relationships (OOR), by leveraging synthetically generated 3D samples from pre-trained 2D diffusion models. We hypothesize that images synthesized by 2D diffusion models inherently capture plausible and realistic OOR cues, enabling efficient ways to collect a 3D dataset to learn OOR for various unbounded object categories. Our approach begins by synthesizing diverse images that capture plausible OOR cues, which we then uplift into 3D samples. Leveraging our diverse collection of plausible 3D samples for the object pairs, we train a score-based OOR diffusion model to learn the distribution of their relative spatial relationships. Additionally, we extend our pairwise OOR to multi-object OOR by enforcing consistency across pairwise relations and preventing object collisions. Extensive experiments demonstrate the robustness of our method across various object-object spatial relationships, along with its applicability to real-world 3D scene arrangement tasks using the OOR diffusion model.
Problem

Research questions and friction points this paper is trying to address.

Learning 3D spatial relationships from 2D diffusion models
Generating diverse 3D samples for object-object spatial relationships
Extending pairwise spatial relationships to multi-object scenarios
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverage pre-trained 2D diffusion models
Uplift 2D cues to 3D spatial samples
Train score-based OOR diffusion model
🔎 Similar Papers
No similar papers found.
S
Sangwon Beak
Seoul National University
H
Hyeonwoo Kim
Seoul National University
Hanbyul Joo
Hanbyul Joo
Assistant Professor, Seoul National University
Computer VisionAIModeling Social SignalsGraphics