SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Online reinforcement learning for image editing is hindered by the absence of fine-grained and reliable reward signals, with existing evaluators often suffering from “attention collapse” due to their neglect of cross-image comparisons and detail-aware perception. This work proposes the first explicit spatial reasoning–driven reward modeling approach, which aligns semantic judgments with spatial awareness by predicting edited regions and grounding them in pixel-level evidence. To support this framework, we construct a spatially aware training dataset comprising 260,000 samples and integrate it into an online reinforcement learning pipeline. Our method achieves state-of-the-art performance across multiple benchmarks, including MMRB2, EditReward-Bench, and MultiEditReward-Bench. When deployed as a reward signal, it improves OmniGen2 by 0.90 points on GEdit-Bench—yielding gains twice as large as those achieved by leading discriminative models and GPT-4.1.

Technology Category

Application Category

📝 Abstract
Online Reinforcement Learning (RL) offers a promising avenue for complex image editing but is currently constrained by the scarcity of reliable and fine-grained reward signals. Existing evaluators frequently struggle with a critical perception gap we term"Attention Collapse,"where models neglect cross-image comparisons and fail to capture fine-grained details, resulting in inaccurate perception and miscalibrated scores. To address these limitations, we propose SpatialReward, a reward model that enforces precise verification via explicit spatial reasoning. By anchoring reasoning to predicted edit regions, SpatialReward grounds semantic judgments in pixel-level evidence, significantly enhancing evaluative accuracy. Trained on a curated 260k spatial-aware dataset, our model achieves state-of-the-art performance on MMRB2 and EditReward-Bench, and outperforms proprietary evaluators on our proposed MultiEditReward-Bench. Furthermore, SpatialReward serves as a robust signal in online RL, boosting OmniGen2 by +0.90 on GEdit-Bench--surpassing the leading discriminative model and doubling the gain of GPT-4.1 (+0.45). These results demonstrate that spatial reasoning is essential for unlocking effective alignment in image editing.
Problem

Research questions and friction points this paper is trying to address.

Online Reinforcement Learning
Image Editing
Reward Signal
Attention Collapse
Perception Gap
Innovation

Methods, ideas, or system contributions that make the work stand out.

SpatialReward
spatial reasoning
online reinforcement learning
image editing
reward modeling
🔎 Similar Papers
No similar papers found.
Y
Yancheng Long
Harbin Institute of Technology, Shenzhen
Y
Yankai Yang
Harbin Institute of Technology, Shenzhen
H
Hongyang Wei
Tsinghua Shenzhen International Graduate School, Tsinghua University
Wei Chen
Wei Chen
HKUST
Computer VisionVision-Language
Tianke Zhang
Tianke Zhang
Tsinghua University; Kuaishou Technology
Computer VisionNeuro-Linguistic Programming
H
Haonan Fan
Kuaishou Technology
C
Changyi Liu
Kuaishou Technology
Kaiyu Jiang
Kaiyu Jiang
Kuaishou
MLLM
J
Jiankang Chen
Kuaishou Technology
K
Kaiyu Tang
Kuaishou Technology
Bin Wen
Bin Wen
快手
MLLM
F
Fan Yang
Kuaishou Technology
T
Tingting Gao
Kuaishou Technology
H
Han Li
Kuaishou Technology
Shuo Yang
Shuo Yang
Professor, Harbin Institute of Technology (Shenzhen)
Data-Centric AITrustworthy AIMachine LearningComputer Vision