DDA-Thinker: Decoupled Dual-Atomic Reinforcement Learning for Reasoning-Driven Image Editing

📅 2026-04-28

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the limited performance of existing image editing models on complex reasoning tasks, which stems primarily from inadequate modeling of planning capabilities. To overcome this, the authors propose the DDA-Thinker framework, introducing a novel Thinker-centric paradigm that decouples the reasoning planner (the Thinker) from the generative module (the Editor), enabling independent optimization of the Thinker while keeping the Editor fixed. The approach incorporates a dual atomic reward mechanism—combining cognitive and visual feedback based on verifiable checklists—and a difficulty-aware curriculum learning strategy, supported by a two-stage data construction pipeline. Experimental results demonstrate that DDA-Thinker significantly outperforms baseline methods on both RISE-Bench and KRIS-Bench, achieving performance on par with powerful closed-source models using only open-source components.

📝 Abstract

Recent image editing models have achieved strong visual fidelity but often struggle with tasks requiring complex reasoning. To investigate and enhance the reasoning-grounded planning for image editing, we propose DDA-Thinker, a Thinker-centric framework designed for the independent optimization of a planning module (Thinker) over a fixed generative model (Editor). This decoupled Thinker-centric paradigm facilitates a controlled analysis of the planning module and makes its contribution under a fixed Editor easier to assess. To effectively guide this Thinker, we introduce a dual-atomic reinforcement learning framework. This framework decomposes feedback into two distinct atomic rewards implemented through verifiable checklists: a cognitive-atomic reward to directly assess the quality of the Thinker's executable plan, which serves as the actionable outcome of the Thinker's reasoning, and a visual-atomic reward to assess the final image quality. To improve checklist quality, our checklist synthesis is grounded not only in the source image and user instruction but also in a rational reference description of the ideal post-edit scene. To support this training, we further develop a two-stage data curation pipeline that first synthesizes a diverse and reasoning-focused dataset, then applies difficulty-aware refinement to curate an effective training curriculum for reinforcement learning. Extensive experiments on reasoning-driven image editing benchmarks, including RISE-Bench and KRIS-Bench, demonstrate that our approach substantially improves overall performance. Our method enables a community model to achieve results competitive with strong proprietary models, highlighting the practical potential of Thinker-centric optimization under a fixed-editor setting.

Problem

Research questions and friction points this paper is trying to address.

reasoning-driven image editing

complex reasoning

planning module

visual fidelity

image editing benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoupled Planning

Dual-Atomic Reinforcement Learning

Reasoning-Driven Image Editing