🤖 AI Summary
Existing 3D indoor layout generation methods suffer from either poor generalization (traditional approaches) or insufficient physical plausibility (LLM/VLM-driven methods). To address these limitations, we propose the first multi-agent framework that decouples and synergistically optimizes semantic understanding and physical constraints: a Planner invokes a semantic optimization tool to refine abstract spatial relationships; a Designer employs a physics-aware optimization tool—based on grid matching—to resolve geometric conflicts; and an Evaluator provides closed-loop feedback. The two optimization tools operate independently and iteratively, coordinated dynamically via a task-scheduling mechanism. This design enables the first divide-and-conquer joint optimization of semantic comprehension and geometric constraints. Evaluated on standard benchmarks, our method achieves comprehensive SOTA performance, with significant improvements in layout合理性 (physical plausibility), visual realism, and cross-scene generalization—enabling high-fidelity construction of complex indoor virtual environments.
📝 Abstract
3D indoor layout synthesis is crucial for creating virtual environments. Traditional methods struggle with generalization due to fixed datasets. While recent LLM and VLM-based approaches offer improved semantic richness, they often lack robust and flexible refinement, resulting in suboptimal layouts. We develop DisCo-Layout, a novel framework that disentangles and coordinates physical and semantic refinement. For independent refinement, our Semantic Refinement Tool (SRT) corrects abstract object relationships, while the Physical Refinement Tool (PRT) resolves concrete spatial issues via a grid-matching algorithm. For collaborative refinement, a multi-agent framework intelligently orchestrates these tools, featuring a planner for placement rules, a designer for initial layouts, and an evaluator for assessment. Experiments demonstrate DisCo-Layout's state-of-the-art performance, generating realistic, coherent, and generalizable 3D indoor layouts. Our code will be publicly available.