CoCoEdit: Content-Consistent Image Editing via Region Regularized Reinforcement Learning

📅 2026-02-15

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses the challenge that existing image editing models often compromise content consistency in non-target regions when modifying specified areas. To mitigate this, we propose a region-regularized reinforcement learning post-training framework that jointly optimizes pixel-level similarity rewards and semantic rewards derived from a multimodal large language model (MLLM). This approach guides the model to preserve non-edited regions faithfully while maintaining high editing fidelity. Leveraging a high-quality training set of 40K samples annotated with fine-grained masks and editing instructions, our method incorporates a regularization mechanism that retains original content for high-reward samples and enhances edits for low-reward ones. Evaluated on Qwen-Image-Edit and FLUX-Kontext, our approach achieves editing quality comparable to state-of-the-art methods while significantly improving content consistency, as evidenced by superior PSNR/SSIM metrics and human evaluations.

Technology Category

Application Category

📝 Abstract

Image editing has achieved impressive results with the development of large-scale generative models. However, existing models mainly focus on the editing effects of intended objects and regions, often leading to unwanted changes in unintended regions. We present a post-training framework for Content-Consistent Editing (CoCoEdit) via region regularized reinforcement learning. We first augment existing editing datasets with refined instructions and masks, from which 40K diverse and high quality samples are curated as training set. We then introduce a pixel-level similarity reward to complement MLLM-based rewards, enabling models to ensure both editing quality and content consistency during the editing process. To overcome the spatial-agnostic nature of the rewards, we propose a region-based regularizer, aiming to preserve non-edited regions for high-reward samples while encouraging editing effects for low-reward samples. For evaluation, we annotate editing masks for GEdit-Bench and ImgEdit-Bench, introducing pixel-level similarity metrics to measure content consistency and editing quality. Applying CoCoEdit to Qwen-Image-Edit and FLUX-Kontext, we achieve not only competitive editing scores with state-of-the-art models, but also significantly better content consistency, measured by PSNR/SSIM metrics and human subjective ratings.

Problem

Research questions and friction points this paper is trying to address.

image editing

content consistency

unintended changes

region preservation

editing fidelity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Content-Consistent Editing

Region-Regularized Reinforcement Learning

Pixel-Level Similarity Reward