Sel3DCraft: Interactive Visual Prompts for User-Friendly Text-to-3D Generation

📅 2025-08-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current text-to-3D generation suffers from high prompt blindness, poor controllability, low multi-view consistency, and insufficient spatial-semantic alignment. To address these challenges, we propose the first interactive visual prompting framework explicitly designed for 3D generation. Our method employs a dual-branch retrieval-generation fusion architecture to enhance candidate diversity; introduces a multi-view hybrid scoring mechanism grounded in multimodal large language models (MLLMs) and high-level semantic metrics, achieving strong correlation with human judgments; and develops a suite of visualization tools supporting defect localization and iterative optimization. Extensive experiments and user studies demonstrate significant improvements: +23.6% FID reduction, 41% fewer user interventions for controllability, and enhanced creative expressiveness. Our framework establishes a new paradigm for text-driven 3D content creation—interpretable, interactive, and optimization-aware.

Technology Category

Application Category

📝 Abstract
Text-to-3D (T23D) generation has transformed digital content creation, yet remains bottlenecked by blind trial-and-error prompting processes that yield unpredictable results. While visual prompt engineering has advanced in text-to-image domains, its application to 3D generation presents unique challenges requiring multi-view consistency evaluation and spatial understanding. We present Sel3DCraft, a visual prompt engineering system for T23D that transforms unstructured exploration into a guided visual process. Our approach introduces three key innovations: a dual-branch structure combining retrieval and generation for diverse candidate exploration; a multi-view hybrid scoring approach that leverages MLLMs with innovative high-level metrics to assess 3D models with human-expert consistency; and a prompt-driven visual analytics suite that enables intuitive defect identification and refinement. Extensive testing and user studies demonstrate that Sel3DCraft surpasses other T23D systems in supporting creativity for designers.
Problem

Research questions and friction points this paper is trying to address.

Unpredictable results in text-to-3D generation due to blind trial-and-error prompting
Challenges in applying visual prompts to 3D generation for multi-view consistency
Need for intuitive defect identification and refinement in 3D model creation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-branch structure for diverse candidate exploration
Multi-view hybrid scoring with MLLMs for 3D assessment
Prompt-driven visual analytics for defect refinement
🔎 Similar Papers
No similar papers found.
N
Nan Xiang
East China Normal University
Tianyi Liang
Tianyi Liang
PHD, East China Normal University, Shanghai AI Lab,Shanghai Innovation Institute
Multimodal LearningLLMsImage Editing
H
Haiwen Huang
East China Normal University
S
Shiqi Jiang
East China Normal University
H
Hao Huang
East China Normal University
Y
Yifei Huang
East China Normal University
L
Liangyu Chen
East China Normal University
C
Changbo Wang
East China Normal University
Chenhui Li
Chenhui Li
Baidu
AINLPCV