🤖 AI Summary
This work addresses the insufficient synergy between personalized understanding and generation in existing unified multimodal models, which hinders effective cross-task reasoning. To overcome this limitation, we propose Sync-R1, a novel framework that introduces an explicit collaborative reasoning mechanism, jointly optimizing understanding and generation through end-to-end cooperative reinforcement learning under a unified reward system. The core contributions include the Sync-GRPO algorithm, a dynamic group scaling (DGS) strategy, and UnifyBench++, a more realistic and comprehensive benchmark for evaluation. Experimental results demonstrate that Sync-R1 significantly enhances cross-task reasoning capabilities and personalization performance on UnifyBench++, all without requiring complex cold-start procedures.
📝 Abstract
Unified Multimodal Models (UMMs) excel in general tasks but struggle to bridge the gap between personalized understanding and generation. Prior works largely rely on implicit token-level alignment via supervised fine-tuning, which fails to fully capture the potential synergy between comprehension and creation. In this work, we propose Sync-R1, an end-to-end reinforcement learning framework that jointly optimizes personalized understanding and generation within a single, explicit reasoning loop. Through this unified feedback process, Sync-R1 enables personalized comprehension to guide content creation, while the resulting generation quality reciprocally refines understanding within an integrated reward landscape. To efficiently orchestrate this dual-task synergy, we introduce Sync-GRPO, a reinforcement learning method utilizing an ensemble reward system. Furthermore, we propose Dynamic Group Scaling (DGS), which adaptively filters low-potential trajectories to reduce gradient variance and accelerate convergence. To better reflect real-world complexity, we introduce UnifyBench++, featuring denser textual descriptions and richer user contexts. Experimental results demonstrate that Sync-R1 achieves state-of-the-art performance, showcasing superior cross-task reasoning and robust personalization without requiring complex cold-start procedures. The code and the UnifyBench++ dataset will be released at: https://github.com/arctanxarc/UniCTokens.