🤖 AI Summary
This work addresses the challenge of generating executable Blender code with large language models, which often fail due to syntax errors and geometric inconsistencies. To mitigate this, the study introduces retrieval-augmented generation (RAG) into 3D modeling code synthesis for the first time. The authors construct a multimodal dataset comprising 500 expert-validated samples, each pairing natural language descriptions with corresponding code and rendered images. During inference, CLIP-driven semantic retrieval dynamically retrieves highly relevant examples to guide code generation. Notably, this approach requires neither model fine-tuning nor specialized hardware. Experimental results demonstrate substantial improvements: the code compilation success rate increases from 40.8% to 70.0%, and the CLIP-based semantic alignment score rises from 0.41 to 0.77, indicating significantly enhanced fidelity and correctness of the generated 3D modeling scripts.
📝 Abstract
Automatic generation of executable Blender code from natural language remains challenging, with state-of-the-art LLMs producing frequent syntactic errors and geometrically inconsistent objects. We present BlenderRAG, a retrieval-augmented generation system that operates on a curated multimodal dataset of 500 expert-validated examples (text, code, image) across 50 object categories. By retrieving semantically similar examples during generation, BlenderRAG improves compilation success rates from 40.8% to 70.0% and semantic normalized alignment from 0.41 to 0.77 (CLIP similarity) across four state-of-the-art LLMs, without requiring fine-tuning or specialized hardware, making it immediately accessible for deployment. The dataset and code will be available at https://github.com/MaxRondelli/BlenderRAG.