Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation

📅 2025-08-01

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Current text-to-3D generation suffers from high prompt blindness, poor controllability, low multi-view consistency, and insufficient spatial-semantic alignment. To address these challenges, we propose the first interactive visual prompting framework explicitly designed for 3D generation. Our method employs a dual-branch retrieval-generation fusion architecture to enhance candidate diversity; introduces a multi-view hybrid scoring mechanism grounded in multimodal large language models (MLLMs) and high-level semantic metrics, achieving strong correlation with human judgments; and develops a suite of visualization tools supporting defect localization and iterative optimization. Extensive experiments and user studies demonstrate significant improvements: +23.6% FID reduction, 41% fewer user interventions for controllability, and enhanced creative expressiveness. Our framework establishes a new paradigm for text-driven 3D content creation—interpretable, interactive, and optimization-aware.

Technology Category

Application Category

📝 Abstract

Text-to-3D (T23D) generation has transformed digital content creation, yet remains bottlenecked by blind trial-and-error prompting processes that yield unpredictable results. While visual prompt engineering has advanced in text-to-image domains, its application to 3D generation presents unique challenges requiring multi-view consistency evaluation and spatial understanding. We present Sel3DCraft, a visual prompt engineering system for T23D that transforms unstructured exploration into a guided visual process. Our approach introduces three key innovations: a dual-branch structure combining retrieval and generation for diverse candidate exploration; a multi-view hybrid scoring approach that leverages MLLMs with innovative high-level metrics to assess 3D models with human-expert consistency; and a prompt-driven visual analytics suite that enables intuitive defect identification and refinement. Extensive testing and user studies demonstrate that Sel3DCraft surpasses other T23D systems in supporting creativity for designers.

Problem

Research questions and friction points this paper is trying to address.

Unpredictable results in text-to-3D generation due to blind trial-and-error prompting

Challenges in applying visual prompts to 3D generation for multi-view consistency

Need for intuitive defect identification and refinement in 3D model creation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-branch structure for diverse candidate exploration

Multi-view hybrid scoring with MLLMs for 3D assessment

Prompt-driven visual analytics for defect refinement

🔎 Similar Papers

No similar papers found.