Stable Score Distillation

📅 2025-07-12

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

To address poor stability, weak spatial control, and insufficient editing strength in text-guided image and 3D editing, this paper proposes a lightweight diffusion-based editing framework. Methodologically, it integrates into the Score Distillation Sampling (SDS) paradigm without requiring auxiliary networks or additional losses. Key contributions include: (1) eliminating complex auxiliary structures and instead anchoring optimization to a single classifier guided by the source prompt, enhancing both optimization stability and semantic alignment; (2) introducing a cross-prompt alignment mechanism derived from the Classifier-Free Guidance (CFG) equation, coupled with a fixed null-text branch, to jointly ensure content fidelity and training stability; and (3) designing an explicit prompt-augmentation branch to strengthen stylistic editing capability. Evaluated on both 2D image editing and NeRF-driven 3D editing tasks, the method achieves state-of-the-art performance, with faster convergence, higher computational efficiency, and improved robustness.

Technology Category

Application Category

📝 Abstract

Text-guided image and 3D editing have advanced with diffusion-based models, yet methods like Delta Denoising Score often struggle with stability, spatial control, and editing strength. These limitations stem from reliance on complex auxiliary structures, which introduce conflicting optimization signals and restrict precise, localized edits. We introduce Stable Score Distillation (SSD), a streamlined framework that enhances stability and alignment in the editing process by anchoring a single classifier to the source prompt. Specifically, SSD utilizes Classifier-Free Guidance (CFG) equation to achieves cross-prompt alignment, and introduces a constant term null-text branch to stabilize the optimization process. This approach preserves the original content's structure and ensures that editing trajectories are closely aligned with the source prompt, enabling smooth, prompt-specific modifications while maintaining coherence in surrounding regions. Additionally, SSD incorporates a prompt enhancement branch to boost editing strength, particularly for style transformations. Our method achieves state-of-the-art results in 2D and 3D editing tasks, including NeRF and text-driven style edits, with faster convergence and reduced complexity, providing a robust and efficient solution for text-guided editing.

Problem

Research questions and friction points this paper is trying to address.

Improves stability in text-guided image and 3D editing

Enhances spatial control and editing strength

Reduces complexity and conflicting optimization signals

Innovation

Methods, ideas, or system contributions that make the work stand out.

Anchors classifier to source prompt for stability

Uses CFG equation for cross-prompt alignment

Adds null-text branch to stabilize optimization

🔎 Similar Papers

No similar papers found.