3DrawAgent: Teaching LLM to Draw in 3D with Early Contrastive Experience

πŸ“… 2026-04-09
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Generating spatially structured 3D sketches from natural language remains a significant challenge. This work proposes a training-free, language-driven framework that leverages large language models (LLMs) to sequentially generate 3D BΓ©zier curves under geometric feedback. It introduces an early comparative experience mechanism based on pairwise comparisons, integrating CLIP-based perceptual rewards with fine-grained LLM evaluations to enable self-enhanced 3D spatial understanding without any parameter updates. Operating within a Group Reward Policy Optimization (GRPO) paradigm, the method relies solely on inference to produce structurally coherent and geometrically plausible 3D sketches from diverse textual prompts. The approach demonstrates strong generalization and geometric reasoning capabilities, establishing a novel paradigm for training-free 3D sketch generation.
πŸ“ Abstract
Sketching in 3D space enables expressive reasoning about shape, structure, and spatial relationships, yet generating 3D sketches through natural language remains a major challenge. In this work, we introduce 3DrawAgent, a training-free, language-driven framework for 3D sketch generation that leverages large language models (LLMs) to sequentially draw 3D Bezier curves under geometric feedback. Unlike prior 2D sketch agents, our method introduces a relative experience optimization strategy that adapts the recently proposed Group Reward Policy Optimization (GRPO) paradigm. Instead of relying on explicit ground-truth supervision, we construct pairwise comparisons among generated sketches, with each pair consisting of a relatively better and a worse result based on CLIP-based perceptual rewards and LLM-based fine-grained qualitative assessment. These experiences are then used to iteratively refine the prior knowledge of 3D drawing, enabling black-box reinforcement of the model's 3D awareness. This design allows our model to self-improve its spatial understanding and drawing quality without parameter updates. Experiments show that 3DrawAgent can generate complex and coherent 3D Bezier sketches from diverse textual prompts, exhibit emergent geometric reasoning, and generalize to novel shapes, establishing a new paradigm for advancing the field of training-free 3D sketch intelligence.
Problem

Research questions and friction points this paper is trying to address.

3D sketch generation
natural language
spatial reasoning
training-free
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

3D sketching
training-free
relative experience optimization
Group Reward Policy Optimization
geometric reasoning
πŸ”Ž Similar Papers
No similar papers found.
H
Hongcan Xiao
Beijing University of Posts and Telecommunications
X
Xinyue Xiao
Beijing University of Posts and Telecommunications, Jiangnan University
Y
Yilin Wang
Beijing University of Posts and Telecommunications
Y
Yue Zhang
HaoHan Data
Yonggang Qi
Yonggang Qi
Associate Professor, Beijing University of Posts and Telecommunications
computer visionsketch-based vision learning algorithms and applications