π€ AI Summary
In text-conditioned image/3D generation, Score Distillation Sampling (SDS) often yields blurry edits and identity distortion (e.g., pose or structural misalignment) due to noisy gradient estimates. To address this, we propose an identity-preserving distillation sampling framework centered on Fixed-Point Regularization (FPR)βthe first method to directly regularize the text-conditioned score function within the SDS paradigm, enabling self-calibration of gradient bias without requiring reference image pairs and thereby ensuring identity consistency before and after editing. Our approach significantly improves structural fidelity and detail sharpness in both text-driven image editing and editable Neural Radiance Fields (NeRFs), effectively suppressing blur and identity drift. Quantitative and qualitative evaluations demonstrate consistent superiority over existing state-of-the-art methods across multiple metrics.
π Abstract
Score distillation sampling (SDS) demonstrates a powerful capability for text-conditioned 2D image and 3D object generation by distilling the knowledge from learned score functions. However, SDS often suffers from blurriness caused by noisy gradients. When SDS meets the image editing, such degradations can be reduced by adjusting bias shifts using reference pairs, but the de-biasing techniques are still corrupted by erroneous gradients. To this end, we introduce Identity-preserving Distillation Sampling (IDS), which compensates for the gradient leading to undesired changes in the results. Based on the analysis that these errors come from the text-conditioned scores, a new regularization technique, called fixed-point iterative regularization (FPR), is proposed to modify the score itself, driving the preservation of the identity even including poses and structures. Thanks to a self-correction by FPR, the proposed method provides clear and unambiguous representations corresponding to the given prompts in image-to-image editing and editable neural radiance field (NeRF). The structural consistency between the source and the edited data is obviously maintained compared to other state-of-the-art methods.