🤖 AI Summary
To address geometric distortion, disproportionate scaling, and loss of structural details—particularly in doors and windows—when generating photorealistic renderings from school building sketches, this paper proposes a structure-aware diffusion model. The method innovatively integrates architectural component priors: a building semantic component encoder explicitly models semantic parts (e.g., doors, windows), while a component-level retrieval-augmented mechanism injects sketch-space structural information into the diffusion process. Additionally, sketch–text joint conditional control enhances generation controllability. Evaluated on a real-world school building dataset, the approach significantly improves geometric fidelity and layout rationality: FID decreases by 28%, and user-intent alignment increases by 41%. These gains ensure faithful representation of architects’ design intent during early-stage conceptual development.
📝 Abstract
Generative Artificial Intelligence (AI) has advanced rapidly, enabling the generation of renderings from architectural sketches. This progress has significantly improved the efficiency of communication and conceptual expression during the early stage of architectural design. However, generated images often lack the structural details from architects' sketches. While sketches typically emphasize the overall structure, crucial components such as windows and doors are often represented by simple lines or omitted entirely. For school buildings, it is essential to control architectural components, such as the shape and proportion of windows, as these factors directly influence the accuracy of the generated images in reflecting the architect's design intentions. To address this issue, we propose a structure-aware diffusion model for architectural image generation to refine expressing design intentions through retrieval augmentation. Our framework utilizes architectural components to enhance the generation process, addressing the details that may be lacking in the sketches. These components provide clear spatial and structural details, improving the model's ability to interpret and generate architectural details. The refined sketches, combined with text prompts, are fed into the proposed structure-aware diffusion model to generate detailed and realistic school building images. The experiment results demonstrate the effectiveness of our framework in generating architectural designs.