MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation

πŸ“… 2025-02-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Existing graph-structured 3D scene generation methods support only textual input, limiting geometric controllability and fine-grained editing. This work targets virtual reality and interior design, proposing the first geometry-controllable indoor scene generation framework that accepts text–vision hybrid input. Methodologically, we construct a hybrid-modal graph integrating textual and visual node representations; design an explicit relational predictor to model implicit spatial relationships; incorporate a vision-enhancement module to improve geometric fidelity of text-only nodes; and employ a dual-branch diffusion model coupled with a node-feature-based relational reasoning module. Experiments demonstrate state-of-the-art performance in geometric controllability and layout plausibility, significantly enhancing the precision of mapping user instructions to 3D layouts. The framework enables multi-granularity, interactive scene editing while preserving structural coherence and spatial consistency.

Technology Category

Application Category

πŸ“ Abstract
Controllable 3D scene generation has extensive applications in virtual reality and interior design, where the generated scenes should exhibit high levels of realism and controllability in terms of geometry. Scene graphs provide a suitable data representation that facilitates these applications. However, current graph-based methods for scene generation are constrained to text-based inputs and exhibit insufficient adaptability to flexible user inputs, hindering the ability to precisely control object geometry. To address this issue, we propose MMGDreamer, a dual-branch diffusion model for scene generation that incorporates a novel Mixed-Modality Graph, visual enhancement module, and relation predictor. The mixed-modality graph allows object nodes to integrate textual and visual modalities, with optional relationships between nodes. It enhances adaptability to flexible user inputs and enables meticulous control over the geometry of objects in the generated scenes. The visual enhancement module enriches the visual fidelity of text-only nodes by constructing visual representations using text embeddings. Furthermore, our relation predictor leverages node representations to infer absent relationships between nodes, resulting in more coherent scene layouts. Extensive experimental results demonstrate that MMGDreamer exhibits superior control of object geometry, achieving state-of-the-art scene generation performance. Project page: https://yangzhifeio.github.io/project/MMGDreamer.
Problem

Research questions and friction points this paper is trying to address.

Enhance 3D scene geometry control
Integrate textual and visual modalities
Improve scene generation adaptability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-branch diffusion model
Mixed-Modality Graph integration
Visual enhancement module
πŸ”Ž Similar Papers
2024-05-02European Conference on Computer VisionCitations: 6
Zhifei Yang
Zhifei Yang
Peking University
3D GenerationGenerative Models
K
Keyang Lu
School of Artificial Intelligence, Beihang University
C
Chao Zhang
Beijing Digital Native Digital City Research Center
Jiaxing Qi
Jiaxing Qi
BUAA
AIOpsSoftware EngineeringData MiningAI4Science
Hanqi Jiang
Hanqi Jiang
University of Georgia
Medical Image AnalysisMulti-modal Large Language Models
R
Ruifei Ma
Beijing Digital Native Digital City Research Center
S
Shenglin Yin
School of Computer Science, Peking University
Y
Yifan Xu
School of Computer Science and Engineering, Beihang University
Mingzhe Xing
Mingzhe Xing
Peking University
AI AgentAI for Software EngineeringAI for System
Zhen Xiao
Zhen Xiao
Peking University
distributed systemscloud computingmachine learning
Jieyi Long
Jieyi Long
Northwestern University
BlockchainDistributed SystemGenerative AIEDA
X
Xiangde Liu
Beijing Digital Native Digital City Research Center
Guangyao Zhai
Guangyao Zhai
Technical University of Munich; ETH Zurich
Generative AIEmbodied AI