🤖 AI Summary
Existing training-free diffusion-based editing methods require re-running the full denoising process for each editing strength, incurring substantial computational overhead. This work proposes an efficient framework for continuous semantic editing by estimating the local tangent space of the data manifold through perturbed samples, constructing a Jacobian-free local tangent frame, and alternately performing small-step gradient updates and diffusion projections within this coordinate system. The method requires neither re-sampling nor additional training, and theoretical analysis guarantees that the constructed tangent frame approximates the true tangent space. It enables high-fidelity, interactive semantic manipulation, supporting both smooth unsupervised semantic traversal and efficient CLIP-guided continuous adjustments. Experiments demonstrate significant improvements in both editing efficiency and generation quality.
📝 Abstract
Diffusion models are a leading paradigm for data generation, but training-free editing typically re-runs the full denoising trajectory for every edit strength, making iterative refinement expensive. To address this issue, we instead edit near the data manifold, where small local updates can replace repeated re-synthesis. To enable this, we estimate a local manifold tangent space directly from perturbed samples and prove that this sample-based estimator closely approximates the true tangent. Building on this guarantee, we devise a Jacobian-free algorithm that constructs a tangent frame via small perturbations to the initial noise and alternates small tangent moves with diffusion-based projections. Updates within this frame follow principled on-manifold directions while suppressing off-manifold drift, enabling fine-grained edits without full re-diffusion or additional training. Edit strength is controlled by the number of steps for rapid, continuous adjustments that preserve fidelity and plug into existing samplers. Empirically, the resulting tangent directions yield smooth, semantic unsupervised traversals and effective CLIP-guided optimization, demonstrating practical interactive continuous editing.