Region-Constrained Group Relative Policy Optimization for Flow-Based Image Editing

📅 2026-04-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

203K/year
🤖 AI Summary
This work addresses the challenge in existing instruction-guided image editing methods, where global exploration often perturbs non-target regions, leading to high reward variance and inaccurate credit assignment. To mitigate this, the authors propose the RC-GRPO-Editing framework, which leverages deterministic ODE sampling based on flow models and introduces region-decoupled initial noise perturbations to localize policy exploration. An attention-focused reward mechanism is designed to guide cross-attention toward target regions. Combined with region-constrained Group Relative Policy Optimization (GRPO) during post-training, the method enables precise local credit assignment. Evaluated on the CompBench benchmark, the approach significantly improves both instruction adherence in edited regions and content preservation in non-target areas.

Technology Category

Application Category

📝 Abstract
Instruction-guided image editing requires balancing target modification with non-target preservation. Recently, flow-based models have emerged as a strong and increasingly adopted backbone for instruction-guided image editing, thanks to their high fidelity and efficient deterministic ODE sampling. Building on this foundation, GRPO-based reward-driven post-training has been explored to directly optimize editing-specific rewards, improving instruction following and editing consistency. However, existing methods often suffer from noisy credit assignment: global exploration also perturbs non-target regions, inflating within-group reward variance and yielding noisy GRPO advantages. To address this, we propose RC-GRPO-Editing, a region-constrained GRPO post-training framework for flow-based image editing under deterministic ODE sampling. It suppresses background-induced nuisance variance to enable cleaner localized credit assignment, improving editing region instruction adherence while preserving non-target content. Concretely, we localize exploration via region-decoupled initial noise perturbations to reduce background-induced reward variance and stabilize GRPO advantages, and introduce an attention concentration reward that aligns cross-attention with the intended editing region throughout the rollout, reducing unintended changes in non-target regions. Experiments on CompBench show consistent improvements in editing region instruction adherence and non-target preservation.
Problem

Research questions and friction points this paper is trying to address.

instruction-guided image editing
flow-based models
credit assignment
region preservation
editing consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

region-constrained
GRPO
flow-based image editing
attention concentration
deterministic ODE sampling
🔎 Similar Papers
No similar papers found.