🤖 AI Summary
Current text-to-3D generation suffers from high prompt blindness, poor controllability, low multi-view consistency, and insufficient spatial-semantic alignment. To address these challenges, we propose the first interactive visual prompting framework explicitly designed for 3D generation. Our method employs a dual-branch retrieval-generation fusion architecture to enhance candidate diversity; introduces a multi-view hybrid scoring mechanism grounded in multimodal large language models (MLLMs) and high-level semantic metrics, achieving strong correlation with human judgments; and develops a suite of visualization tools supporting defect localization and iterative optimization. Extensive experiments and user studies demonstrate significant improvements: +23.6% FID reduction, 41% fewer user interventions for controllability, and enhanced creative expressiveness. Our framework establishes a new paradigm for text-driven 3D content creation—interpretable, interactive, and optimization-aware.
📝 Abstract
Text-to-3D (T23D) generation has transformed digital content creation, yet remains bottlenecked by blind trial-and-error prompting processes that yield unpredictable results. While visual prompt engineering has advanced in text-to-image domains, its application to 3D generation presents unique challenges requiring multi-view consistency evaluation and spatial understanding. We present Sel3DCraft, a visual prompt engineering system for T23D that transforms unstructured exploration into a guided visual process. Our approach introduces three key innovations: a dual-branch structure combining retrieval and generation for diverse candidate exploration; a multi-view hybrid scoring approach that leverages MLLMs with innovative high-level metrics to assess 3D models with human-expert consistency; and a prompt-driven visual analytics suite that enables intuitive defect identification and refinement. Extensive testing and user studies demonstrate that Sel3DCraft surpasses other T23D systems in supporting creativity for designers.