🤖 AI Summary
Existing 3D scene generation methods lack fine-grained control over scene elements, hindering satisfaction of complex spatial and semantic constraints in real-world applications. Method: We propose a requirement-sensitive, multi-stage synthesis framework featuring ScenethesisLang—a novel intermediate representation language with expressive constraint modeling capabilities—enabling precise, traceable mapping from natural language requirements to executable 3D code. Our approach integrates domain-specific language (DSL) design, formal constraint modeling, program synthesis, and BLIP-2–based visual evaluation to explicitly encode, verify, and iteratively refine constraints. Contribution/Results: Experiments show the framework accurately captures over 80% of user requirements, satisfies more than 90% of hard constraints (exceeding 100 per scene), and achieves a 42.8% improvement in BLIP-2 visual evaluation score over state-of-the-art methods—significantly enhancing controllability, interpretability, and formal verifiability.
📝 Abstract
Graphical user interface (UI) software has undergone a fundamental transformation from traditional two-dimensional (2D) desktop/web/mobile interfaces to spatial three-dimensional (3D) environments. While existing work has made remarkable success in automated 2D software generation, such as HTML/CSS and mobile app interface code synthesis, the generation of 3D software still remains under-explored. Current methods for 3D software generation usually generate the 3D environments as a whole and cannot modify or control specific elements in the software. Furthermore, these methods struggle to handle the complex spatial and semantic constraints inherent in the real world. To address the challenges, we present Scenethesis, a novel requirement-sensitive 3D software synthesis approach that maintains formal traceability between user specifications and generated 3D software. Scenethesis is built upon ScenethesisLang, a domain-specific language that serves as a granular constraint-aware intermediate representation (IR) to bridge natural language requirements and executable 3D software. It serves both as a comprehensive scene description language enabling fine-grained modification of 3D software elements and as a formal constraint-expressive specification language capable of expressing complex spatial constraints. By decomposing 3D software synthesis into stages operating on ScenethesisLang, Scenethesis enables independent verification, targeted modification, and systematic constraint satisfaction. Our evaluation demonstrates that Scenethesis accurately captures over 80% of user requirements and satisfies more than 90% of hard constraints while handling over 100 constraints simultaneously. Furthermore, Scenethesis achieves a 42.8% improvement in BLIP-2 visual evaluation scores compared to the state-of-the-art method.