CoCoEdit: Content-Consistent Image Editing via Region Regularized Reinforcement Learning

📅 2026-02-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that existing image editing models often compromise content consistency in non-target regions when modifying specified areas. To mitigate this, we propose a region-regularized reinforcement learning post-training framework that jointly optimizes pixel-level similarity rewards and semantic rewards derived from a multimodal large language model (MLLM). This approach guides the model to preserve non-edited regions faithfully while maintaining high editing fidelity. Leveraging a high-quality training set of 40K samples annotated with fine-grained masks and editing instructions, our method incorporates a regularization mechanism that retains original content for high-reward samples and enhances edits for low-reward ones. Evaluated on Qwen-Image-Edit and FLUX-Kontext, our approach achieves editing quality comparable to state-of-the-art methods while significantly improving content consistency, as evidenced by superior PSNR/SSIM metrics and human evaluations.

Technology Category

Application Category

📝 Abstract
Image editing has achieved impressive results with the development of large-scale generative models. However, existing models mainly focus on the editing effects of intended objects and regions, often leading to unwanted changes in unintended regions. We present a post-training framework for Content-Consistent Editing (CoCoEdit) via region regularized reinforcement learning. We first augment existing editing datasets with refined instructions and masks, from which 40K diverse and high quality samples are curated as training set. We then introduce a pixel-level similarity reward to complement MLLM-based rewards, enabling models to ensure both editing quality and content consistency during the editing process. To overcome the spatial-agnostic nature of the rewards, we propose a region-based regularizer, aiming to preserve non-edited regions for high-reward samples while encouraging editing effects for low-reward samples. For evaluation, we annotate editing masks for GEdit-Bench and ImgEdit-Bench, introducing pixel-level similarity metrics to measure content consistency and editing quality. Applying CoCoEdit to Qwen-Image-Edit and FLUX-Kontext, we achieve not only competitive editing scores with state-of-the-art models, but also significantly better content consistency, measured by PSNR/SSIM metrics and human subjective ratings.
Problem

Research questions and friction points this paper is trying to address.

image editing
content consistency
unintended changes
region preservation
editing fidelity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Content-Consistent Editing
Region-Regularized Reinforcement Learning
Pixel-Level Similarity Reward
Image Editing Consistency
Mask-Guided Editing
🔎 Similar Papers
No similar papers found.
Yuhui Wu
Yuhui Wu
PolyU
Image/Video Editinglow-light enhancement
C
Chenxi Xie
1The Hong Kong Polytechnic University, Hong Kong; 2OPPO Research Institute, ShenZhen, China
Ruibin Li
Ruibin Li
University of Toronto
Persistent MemoryFile System
Liyi Chen
Liyi Chen
PhD at PolyU, HK
Q
Qiaosi Yi
1The Hong Kong Polytechnic University, Hong Kong; 2OPPO Research Institute, ShenZhen, China
Lei Zhang
Lei Zhang
Chair Professor, Dept. of Computing, The Hong Kong Polytechnic University
Computer VisionImage ProcessingPattern RecognitionMachine Learning