Hands-off Image Editing: Language-guided Editing without any Task-specific Labeling, Masking or even Training

📅 2025-02-14
🏛️ International Conference on Computational Linguistics
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing instruction-guided image editing methods rely on task-specific annotations, segmentation masks, or model fine-tuning, limiting their generalizability and deployment efficiency. This paper introduces the first fully unsupervised, language-guided image editing framework—requiring no annotations, masks, or model adaptation—enabling zero-shot, zero-training, zero-mask “plug-and-play” editing. Our approach synergistically leverages pre-trained multimodal large models (CLIP and a diffusion prior) and employs gradient-driven latent-space inversion to autonomously discover semantically consistent editing trajectories under text-image alignment constraints. Evaluated across multiple benchmarks, our method achieves state-of-the-art performance. Comprehensive qualitative and quantitative analyses demonstrate superior editing accuracy, image fidelity, and output diversity compared to supervised alternatives.

Technology Category

Application Category

📝 Abstract
Instruction-guided image editing consists in taking an image and an instruction and deliverring that image altered according to that instruction. State-of-the-art approaches to this task suffer from the typical scaling up and domain adaptation hindrances related to supervision as they eventually resort to some kind of task-specific labelling, masking or training. We propose a novel approach that does without any such task-specific supervision and offers thus a better potential for improvement. Its assessment demonstrates that it is highly effective, achieving very competitive performance.
Problem

Research questions and friction points this paper is trying to address.

Language-guided image editing without supervision
Eliminates task-specific labeling or training
Enhances scalability and domain adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Language-guided image editing
No task-specific supervision
Competitive performance achieved
🔎 Similar Papers
No similar papers found.