Canvas3D: Empowering Precise Spatial Control for Image Generation with Constraints from a 3D Virtual Canvas

📅 2025-08-09

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Generative AI has long struggled with precise spatial composition control in image generation, particularly in enabling fine-grained, interactive manipulation of object layout and scene conditions. To address this, we propose a virtual canvas framework grounded in a real-time 3D engine: it parses textual inputs into manipulable 3D object instances, enabling intuitive user interaction—including drag-and-drop, scaling, and constraint specification—and translates spatial intent into structured geometric constraints that guide diffusion-based image synthesis. Our key contribution is the first integration of a real-time 3D interactive engine as a human–AI co-design medium for spatial modeling, unifying natural language understanding, 3D instantiation, spatial relation reasoning, and controllable image generation. Experiments demonstrate significant improvements over state-of-the-art baselines: +32.7% IoU in spatial accuracy, 41% reduction in task completion time, and higher user satisfaction—validated through open-ended real-world usability testing.

Technology Category

Application Category

📝 Abstract

Generative AI (GenAI) has significantly advanced the ease and flexibility of image creation. However, it remains a challenge to precisely control spatial compositions, including object arrangement and scene conditions. To bridge this gap, we propose Canvas3D, an interactive system leveraging a 3D engine to enable precise spatial manipulation for image generation. Upon user prompt, Canvas3D automatically converts textual descriptions into interactive objects within a 3D engine-driven virtual canvas, empowering direct and precise spatial configuration. These user-defined arrangements generate explicit spatial constraints that guide generative models in accurately reflecting user intentions in the resulting images. We conducted a closed-end comparative study between Canvas3D and a baseline system. And an open-ended study to evaluate our system "in the wild". The result indicates that Canvas3D outperforms the baseline on spatial control, interactivity, and overall user experience.

Problem

Research questions and friction points this paper is trying to address.

Precise spatial control in image generation

Object arrangement and scene conditions management

Bridging gap between user intent and generative models

Innovation

Methods, ideas, or system contributions that make the work stand out.

3D engine-driven virtual canvas for spatial control

Text-to-interactive object conversion in 3D space

User-defined constraints guide generative models precisely

🔎 Similar Papers

No similar papers found.