Canvas3D: Empowering Precise Spatial Control for Image Generation with Constraints from a 3D Virtual Canvas

📅 2025-08-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generative AI has long struggled with precise spatial composition control in image generation, particularly in enabling fine-grained, interactive manipulation of object layout and scene conditions. To address this, we propose a virtual canvas framework grounded in a real-time 3D engine: it parses textual inputs into manipulable 3D object instances, enabling intuitive user interaction—including drag-and-drop, scaling, and constraint specification—and translates spatial intent into structured geometric constraints that guide diffusion-based image synthesis. Our key contribution is the first integration of a real-time 3D interactive engine as a human–AI co-design medium for spatial modeling, unifying natural language understanding, 3D instantiation, spatial relation reasoning, and controllable image generation. Experiments demonstrate significant improvements over state-of-the-art baselines: +32.7% IoU in spatial accuracy, 41% reduction in task completion time, and higher user satisfaction—validated through open-ended real-world usability testing.

Technology Category

Application Category

📝 Abstract
Generative AI (GenAI) has significantly advanced the ease and flexibility of image creation. However, it remains a challenge to precisely control spatial compositions, including object arrangement and scene conditions. To bridge this gap, we propose Canvas3D, an interactive system leveraging a 3D engine to enable precise spatial manipulation for image generation. Upon user prompt, Canvas3D automatically converts textual descriptions into interactive objects within a 3D engine-driven virtual canvas, empowering direct and precise spatial configuration. These user-defined arrangements generate explicit spatial constraints that guide generative models in accurately reflecting user intentions in the resulting images. We conducted a closed-end comparative study between Canvas3D and a baseline system. And an open-ended study to evaluate our system "in the wild". The result indicates that Canvas3D outperforms the baseline on spatial control, interactivity, and overall user experience.
Problem

Research questions and friction points this paper is trying to address.

Precise spatial control in image generation
Object arrangement and scene conditions management
Bridging gap between user intent and generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D engine-driven virtual canvas for spatial control
Text-to-interactive object conversion in 3D space
User-defined constraints guide generative models precisely
🔎 Similar Papers
No similar papers found.