🤖 AI Summary
To address poor stability, weak spatial control, and insufficient editing strength in text-guided image and 3D editing, this paper proposes a lightweight diffusion-based editing framework. Methodologically, it integrates into the Score Distillation Sampling (SDS) paradigm without requiring auxiliary networks or additional losses. Key contributions include: (1) eliminating complex auxiliary structures and instead anchoring optimization to a single classifier guided by the source prompt, enhancing both optimization stability and semantic alignment; (2) introducing a cross-prompt alignment mechanism derived from the Classifier-Free Guidance (CFG) equation, coupled with a fixed null-text branch, to jointly ensure content fidelity and training stability; and (3) designing an explicit prompt-augmentation branch to strengthen stylistic editing capability. Evaluated on both 2D image editing and NeRF-driven 3D editing tasks, the method achieves state-of-the-art performance, with faster convergence, higher computational efficiency, and improved robustness.
📝 Abstract
Text-guided image and 3D editing have advanced with diffusion-based models, yet methods like Delta Denoising Score often struggle with stability, spatial control, and editing strength. These limitations stem from reliance on complex auxiliary structures, which introduce conflicting optimization signals and restrict precise, localized edits. We introduce Stable Score Distillation (SSD), a streamlined framework that enhances stability and alignment in the editing process by anchoring a single classifier to the source prompt. Specifically, SSD utilizes Classifier-Free Guidance (CFG) equation to achieves cross-prompt alignment, and introduces a constant term null-text branch to stabilize the optimization process. This approach preserves the original content's structure and ensures that editing trajectories are closely aligned with the source prompt, enabling smooth, prompt-specific modifications while maintaining coherence in surrounding regions. Additionally, SSD incorporates a prompt enhancement branch to boost editing strength, particularly for style transformations. Our method achieves state-of-the-art results in 2D and 3D editing tasks, including NeRF and text-driven style edits, with faster convergence and reduced complexity, providing a robust and efficient solution for text-guided editing.