🤖 AI Summary
This work proposes VisionCreator-R1, a native visual generation agent endowed with explicit reflective capabilities to address the prevalent reliance on plan-driven paradigms in existing visual generative agents, which often lack systematic mechanisms for identifying and correcting visual errors during generation. The authors introduce a Reflection-Plan Co-Optimization (RPCO) training framework, first performing staged fine-tuning on a newly curated VCR-SFT dataset and subsequently applying reinforcement learning on the VCR-RL dataset to jointly optimize reflection and planning. Their analysis reveals an asymmetry between reflection and planning during reinforcement learning, which informs the design of the RPCO strategy to enable synergistic improvement of both components. Evaluated on established benchmarks and a newly introduced VCR-bench, VisionCreator-R1 outperforms Gemini 2.5 Pro in both single-image and multi-image generation tasks.
📝 Abstract
Visual content generation has advanced from single-image to multi-image workflows, yet existing agents remain largely plan-driven and lack systematic reflection mechanisms to correct mid-trajectory visual errors. To address this limitation, we propose VisionCreator-R1, a native visual generation agent with explicit reflection, together with a Reflection-Plan Co-Optimization (RPCO) training methodology. Through extensive experiments and trajectory-level analysis, we uncover reflection-plan optimization asymmetry in reinforcement learning (RL): planning can be reliably optimized via plan rewards, while reflection learning is hindered by noisy credit assignment. Guided by this insight, our RPCO first trains on the self-constructed VCR-SFT dataset with reflection-strong single-image trajectories and planning-strong multi-image trajectories, then co-optimization on VCR-RL dataset via RL. This yields our unified VisionCreator-R1 agent, which consistently outperforms Gemini2.5Pro on existing benchmarks and our VCR-bench covering single-image and multi-image tasks.