🤖 AI Summary
Existing 3D scene generation methods struggle to model logical dependencies and physical constraints among objects, resulting in scenes with limited dynamic adaptability and realism. To address this, we propose CausalStruct—the first framework to integrate causal reasoning into 3D scene generation. It leverages large language models to construct object-level causal graphs that explicitly encode semantic dependencies and spatial constraints; combines causal intervention with a PID controller for controllable layout optimization; and employs 3D Gaussian Splatting coupled with Score Distillation Sampling for high-fidelity rendering. Experiments demonstrate that CausalStruct significantly outperforms state-of-the-art methods in logical consistency, physical plausibility, and controllability under text- or image-based guidance. The framework enables interpretable, editable, and photorealistic 3D scene generation while preserving structural coherence and physical validity.
📝 Abstract
Existing 3D scene generation methods often struggle to model the complex logical dependencies and physical constraints between objects, limiting their ability to adapt to dynamic and realistic environments. We propose CausalStruct, a novel framework that embeds causal reasoning into 3D scene generation. Utilizing large language models (LLMs), We construct causal graphs where nodes represent objects and attributes, while edges encode causal dependencies and physical constraints. CausalStruct iteratively refines the scene layout by enforcing causal order to determine the placement order of objects and applies causal intervention to adjust the spatial configuration according to physics-driven constraints, ensuring consistency with textual descriptions and real-world dynamics. The refined scene causal graph informs subsequent optimization steps, employing a Proportional-Integral-Derivative(PID) controller to iteratively tune object scales and positions. Our method uses text or images to guide object placement and layout in 3D scenes, with 3D Gaussian Splatting and Score Distillation Sampling improving shape accuracy and rendering stability. Extensive experiments show that CausalStruct generates 3D scenes with enhanced logical coherence, realistic spatial interactions, and robust adaptability.