Drag within Prior Distribution: Text-Conditioned Point-Based Image Editing within Distribution Constraints

📅 2026-05-13

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This work addresses the limitations of existing point-based diffusion editing methods, which often suffer from semantic drift and artifacts due to trajectory ambiguity or accumulated perturbations during large-scale manipulations, while struggling to preserve the original data distribution. To overcome these issues, the authors propose a latent-space editing framework that integrates CLIP-guided semantic alignment, a prior-preserving loss, and a direction-weighted point tracking mechanism. The prior-preserving loss constrains latent variables to remain within the diffusion prior distribution, while the direction-weighted strategy enhances point tracking accuracy. Semantic alignment under text conditions is achieved through CLIP guidance. This approach substantially improves semantic consistency and naturalness in both global and fine-grained edits, effectively suppressing artifacts while maintaining high image fidelity.

📝 Abstract

Diffusion-based point editing methods have gained significant traction in image editing tasks due to their ability to manipulate image semantics and fine details by applying localized perturbations on the manifold of noise latent. However, these approaches face several limitations. Traditional point-based editing relies on pairs of handle and target points to define motion trajectories, which can introduce ambiguity or unnecessary alterations. Furthermore, when the distance between the handle and target points is large, the accumulated perturbations often cause the noise latent deviation from inversion score trajectory, resulting in unnatural artifacts. To address these issues in global editing tasks, we introduce a CLIP-based model to evaluate and guide intermediate editing steps, ensuring that the generated results remain both semantically aligned. Additionally, we propose a prior-preservation loss that constrains the optimized latent code to stay within the sampling space of the diffusion prior, improving consistency with the original data distribution, to ensure the model generates images along a familiar score trajectory. For fine-grained tasks, we present a directionally-weighted point tracking mechanism that steers the editing process toward the target direction within similar feature regions. This improves both the tracking accuracy and generation quality, while also reducing the editing time.

Problem

Research questions and friction points this paper is trying to address.

point-based editing

diffusion models

latent deviation

distribution constraints

image editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion prior

CLIP-guided editing

point-based image editing