🤖 AI Summary
Existing methods struggle to generate consistent, controllable 3D indoor scenes under variable node counts, diverse edge configurations, and interactive editing. This paper proposes EchoScene—the first scene-graph-based dual-branch diffusion model—introducing an “information echoing” mechanism that enables node-level denoising and graph convolutional co-updating between shape and layout branches. It is the first to unify global graph-structural constraints with local editability within a diffusion framework. The model incorporates scene-graph-driven noise sampling, node-correlated denoising, and a graph-convolutional information exchange module, supporting fine-grained interactive editing and globally coherent generation. Experiments demonstrate that EchoScene achieves superior geometric quality, fidelity, and controllability over state-of-the-art methods. Moreover, its outputs seamlessly integrate with commercial texture generation tools.
📝 Abstract
We present EchoScene, an interactive and controllable generative model that generates 3D indoor scenes on scene graphs. EchoScene leverages a dual-branch diffusion model that dynamically adapts to scene graphs. Existing methods struggle to handle scene graphs due to varying numbers of nodes, multiple edge combinations, and manipulator-induced node-edge operations. EchoScene overcomes this by associating each node with a denoising process and enables collaborative information exchange, enhancing controllable and consistent generation aware of global constraints. This is achieved through an information echo scheme in both shape and layout branches. At every denoising step, all processes share their denoising data with an information exchange unit that combines these updates using graph convolution. The scheme ensures that the denoising processes are influenced by a holistic understanding of the scene graph, facilitating the generation of globally coherent scenes. The resulting scenes can be manipulated during inference by editing the input scene graph and sampling the noise in the diffusion model. Extensive experiments validate our approach, which maintains scene controllability and surpasses previous methods in generation fidelity. Moreover, the generated scenes are of high quality and thus directly compatible with off-the-shelf texture generation. Code and trained models are open-sourced.