ART-DECO: Arbitrary Text Guidance for 3D Detailizer Construction

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of rapidly generating high-fidelity 3D geometry and texture assets from coarse 3D proxies, while enabling explicit user control over structural semantics. To this end, we propose a retraining-free, generalizable text-guided 3D detailing framework. Our method employs a two-stage curriculum learning strategy: first learning a structurally simplified representation, then progressively recovering fine-grained details. It integrates multi-view text-conditioned diffusion model distillation, Score Distillation Sampling (SDS), and joint neural implicit modeling of geometry and texture. The resulting framework achieves single-inference latency under one second, supports cross-category, cross-style, and out-of-distribution structural composition, and delivers superior detail fidelity, interactive efficiency, and generalization compared to state-of-the-art text-to-3D approaches.

Technology Category

Application Category

📝 Abstract
We introduce a 3D detailizer, a neural model which can instantaneously (in<1s) transform a coarse 3D shape proxy into a high-quality asset with detailed geometry and texture as guided by an input text prompt. Our model is trained using the text prompt, which defines the shape class and characterizes the appearance and fine-grained style of the generated details. The coarse 3D proxy, which can be easily varied and adjusted (e.g., via user editing), provides structure control over the final shape. Importantly, our detailizer is not optimized for a single shape; it is the result of distilling a generative model, so that it can be reused, without retraining, to generate any number of shapes, with varied structures, whose local details all share a consistent style and appearance. Our detailizer training utilizes a pretrained multi-view image diffusion model, with text conditioning, to distill the foundational knowledge therein into our detailizer via Score Distillation Sampling (SDS). To improve SDS and enable our detailizer architecture to learn generalizable features over complex structures, we train our model in two training stages to generate shapes with increasing structural complexity. Through extensive experiments, we show that our method generates shapes of superior quality and details compared to existing text-to-3D models under varied structure control. Our detailizer can refine a coarse shape in less than a second, making it possible to interactively author and adjust 3D shapes. Furthermore, the user-imposed structure control can lead to creative, and hence out-of-distribution, 3D asset generations that are beyond the current capabilities of leading text-to-3D generative models. We demonstrate an interactive 3D modeling workflow our method enables, and its strong generalizability over styles, structures, and object categories.
Problem

Research questions and friction points this paper is trying to address.

Transforming coarse 3D shapes into detailed assets using text prompts
Enabling interactive and rapid 3D shape authoring with structure control
Generalizing style and appearance across varied 3D structures without retraining
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural model transforms coarse 3D shapes instantly
Uses text prompts for detailed geometry and texture
Two-stage training with Score Distillation Sampling
🔎 Similar Papers
No similar papers found.