ART-DECO: Arbitrary Text Guidance for 3D Detailizer Construction

📅 2025-05-26

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

This work addresses the challenge of rapidly generating high-fidelity 3D geometry and texture assets from coarse 3D proxies, while enabling explicit user control over structural semantics. To this end, we propose a retraining-free, generalizable text-guided 3D detailing framework. Our method employs a two-stage curriculum learning strategy: first learning a structurally simplified representation, then progressively recovering fine-grained details. It integrates multi-view text-conditioned diffusion model distillation, Score Distillation Sampling (SDS), and joint neural implicit modeling of geometry and texture. The resulting framework achieves single-inference latency under one second, supports cross-category, cross-style, and out-of-distribution structural composition, and delivers superior detail fidelity, interactive efficiency, and generalization compared to state-of-the-art text-to-3D approaches.

Technology Category

Application Category

📝 Abstract

We introduce a 3D detailizer, a neural model which can instantaneously (in<1s) transform a coarse 3D shape proxy into a high-quality asset with detailed geometry and texture as guided by an input text prompt. Our model is trained using the text prompt, which defines the shape class and characterizes the appearance and fine-grained style of the generated details. The coarse 3D proxy, which can be easily varied and adjusted (e.g., via user editing), provides structure control over the final shape. Importantly, our detailizer is not optimized for a single shape; it is the result of distilling a generative model, so that it can be reused, without retraining, to generate any number of shapes, with varied structures, whose local details all share a consistent style and appearance. Our detailizer training utilizes a pretrained multi-view image diffusion model, with text conditioning, to distill the foundational knowledge therein into our detailizer via Score Distillation Sampling (SDS). To improve SDS and enable our detailizer architecture to learn generalizable features over complex structures, we train our model in two training stages to generate shapes with increasing structural complexity. Through extensive experiments, we show that our method generates shapes of superior quality and details compared to existing text-to-3D models under varied structure control. Our detailizer can refine a coarse shape in less than a second, making it possible to interactively author and adjust 3D shapes. Furthermore, the user-imposed structure control can lead to creative, and hence out-of-distribution, 3D asset generations that are beyond the current capabilities of leading text-to-3D generative models. We demonstrate an interactive 3D modeling workflow our method enables, and its strong generalizability over styles, structures, and object categories.

Problem

Research questions and friction points this paper is trying to address.

Transforming coarse 3D shapes into detailed assets using text prompts

Enabling interactive and rapid 3D shape authoring with structure control

Generalizing style and appearance across varied 3D structures without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural model transforms coarse 3D shapes instantly

Uses text prompts for detailed geometry and texture

Two-stage training with Score Distillation Sampling

🔎 Similar Papers

No similar papers found.