Concept Lancet: Image Editing with Compositional Representation Transplant

📅 2025-04-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In diffusion-based image editing, editing strength is difficult to adaptively calibrate: excessive strength disrupts visual consistency, while insufficient strength fails to achieve the desired edit—requiring costly trial-and-error. This paper proposes Concept Lancet (CoLan), a zero-shot, plug-and-play framework that jointly models visual concept representations in both text embedding and diffusion score spaces via sparse linear decomposition. CoLan enables image-adaptive estimation of editing strength and concept-level controllable transplantation (replacement, addition, or removal). Key contributions include: (1) CoLan-150K—the first large-scale, diffusion-oriented visual concept representation dataset; (2) a training-free, fine-tuning-free design enabling seamless integration with arbitrary diffusion editors; and (3) significant improvements in editing effectiveness and visual consistency across multiple baselines, achieving state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Diffusion models are widely used for image editing tasks. Existing editing methods often design a representation manipulation procedure by curating an edit direction in the text embedding or score space. However, such a procedure faces a key challenge: overestimating the edit strength harms visual consistency while underestimating it fails the editing task. Notably, each source image may require a different editing strength, and it is costly to search for an appropriate strength via trial-and-error. To address this challenge, we propose Concept Lancet (CoLan), a zero-shot plug-and-play framework for principled representation manipulation in diffusion-based image editing. At inference time, we decompose the source input in the latent (text embedding or diffusion score) space as a sparse linear combination of the representations of the collected visual concepts. This allows us to accurately estimate the presence of concepts in each image, which informs the edit. Based on the editing task (replace/add/remove), we perform a customized concept transplant process to impose the corresponding editing direction. To sufficiently model the concept space, we curate a conceptual representation dataset, CoLan-150K, which contains diverse descriptions and scenarios of visual terms and phrases for the latent dictionary. Experiments on multiple diffusion-based image editing baselines show that methods equipped with CoLan achieve state-of-the-art performance in editing effectiveness and consistency preservation.
Problem

Research questions and friction points this paper is trying to address.

Estimating optimal edit strength for visual consistency in diffusion models
Decomposing source images into sparse concept representations for accurate editing
Enhancing diffusion-based editing via concept transplantation and a curated dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decomposes input as sparse linear combination
Performs customized concept transplant process
Uses curated CoLan-150K conceptual representation dataset