Text-to-3D Generation using Jensen-Shannon Score Distillation

📅 2025-03-08

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing text-to-3D generation methods rely on reverse KL divergence, leading to oversaturated and over-smoothed 3D assets, low inter-sample diversity, and unstable optimization. This work introduces Jensen–Shannon (JS) divergence into score distillation for the first time, yielding a bounded, symmetric, and numerically stable objective. Grounded in GAN theory, we design a log-odds discriminator and propose a minority gradient sampling strategy to robustly estimate gradients. Our method leverages pre-trained text-to-image diffusion models (e.g., SDXL) for guidance. Evaluated on T3Bench, it achieves substantial improvements: generated 3D assets exhibit richer surface details, higher geometric fidelity, greater cross-sample diversity, and superior distribution alignment compared to state-of-the-art KL-based approaches.

Technology Category

Application Category

📝 Abstract

Score distillation sampling is an effective technique to generate 3D models from text prompts, utilizing pre-trained large-scale text-to-image diffusion models as guidance. However, the produced 3D assets tend to be over-saturating, over-smoothing, with limited diversity. These issues are results from a reverse Kullback-Leibler (KL) divergence objective, which makes the optimization unstable and results in mode-seeking behavior. In this paper, we derive a bounded score distillation objective based on Jensen-Shannon divergence (JSD), which stabilizes the optimization process and produces high-quality 3D generation. JSD can match well generated and target distribution, therefore mitigating mode seeking. We provide a practical implementation of JSD by utilizing the theory of generative adversarial networks to define an approximate objective function for the generator, assuming the discriminator is well trained. By assuming the discriminator following a log-odds classifier, we propose a minority sampling algorithm to estimate the gradients of our proposed objective, providing a practical implementation for JSD. We conduct both theoretical and empirical studies to validate our method. Experimental results on T3Bench demonstrate that our method can produce high-quality and diversified 3D assets.

Problem

Research questions and friction points this paper is trying to address.

Over-saturation and over-smoothing in text-to-3D generation.

Limited diversity due to reverse KL divergence optimization.

Unstable optimization leading to mode-seeking behavior.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Jensen-Shannon divergence stabilizes 3D generation

Generative adversarial networks define objective function

Minority sampling algorithm estimates gradient efficiently

🔎 Similar Papers

Dream-in-Style: Text-to-3D Generation using Stylized Score Distillation