Harnessing the Power of Training-Free Techniques in Text-to-2D Generation for Text-to-3D Generation via Score Distillation Sampling

📅 2025-05-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Score Distillation Sampling (SDS) for text-to-3D generation suffers from an inherent trade-off: enhancing texture fidelity often induces geometric distortions, while improving surface smoothness frequently causes scale inconsistency. Method: We systematically investigate the mechanisms of training-free guidance techniques—Classifier-Free Guidance (CFG) and FreeU—within the SDS framework, analyzing their temporal dependencies on scale consistency, surface smoothness, texture fidelity, and geometric accuracy. Based on this analysis, we propose a dynamic temporal-adaptive scaling strategy that modulates guidance strength per denoising step. Contribution/Results: Without additional training or architectural modifications, our method significantly improves texture–geometry coherency. It preserves object-scale robustness while effectively suppressing surface noise and structural artifacts. Experiments demonstrate state-of-the-art quality in zero-shot text-to-3D synthesis, establishing a new paradigm for efficient, controllable, training-free 3D generation.

Technology Category

Application Category

📝 Abstract
Recent studies show that simple training-free techniques can dramatically improve the quality of text-to-2D generation outputs, e.g. Classifier-Free Guidance (CFG) or FreeU. However, these training-free techniques have been underexplored in the lens of Score Distillation Sampling (SDS), which is a popular and effective technique to leverage the power of pretrained text-to-2D diffusion models for various tasks. In this paper, we aim to shed light on the effect such training-free techniques have on SDS, via a particular application of text-to-3D generation via 2D lifting. We present our findings, which show that varying the scales of CFG presents a trade-off between object size and surface smoothness, while varying the scales of FreeU presents a trade-off between texture details and geometric errors. Based on these findings, we provide insights into how we can effectively harness training-free techniques for SDS, via a strategic scaling of such techniques in a dynamic manner with respect to the timestep or optimization iteration step. We show that using our proposed scheme strikes a favorable balance between texture details and surface smoothness in text-to-3D generations, while preserving the size of the output and mitigating the occurrence of geometric defects.
Problem

Research questions and friction points this paper is trying to address.

Exploring training-free techniques' impact on Score Distillation Sampling (SDS)
Analyzing trade-offs in text-to-3D generation via CFG and FreeU scaling
Optimizing dynamic scaling to balance texture, smoothness, and geometry in 3D outputs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging training-free techniques for 3D generation
Dynamic scaling of CFG and FreeU parameters
Balancing texture details and surface smoothness
🔎 Similar Papers
No similar papers found.
J
Junhong Lee
Pohang University of Science and Technology (POSTECH), South Korea
Seungwook Kim
Seungwook Kim
PhD Candidate, POSTECH (Pohang University of Science and Technology) / Research Intern @ Bytedance
Computer Vision
M
Minsu Cho
Pohang University of Science and Technology (POSTECH), South Korea