Uni-Synergy: Bridging Understanding and Generation for Personalized Reasoning via Co-operative Reinforcement Learning

📅 2026-05-11
📈 Citations: 0
Influential: 0
📄 PDF

career value

175K/year
🤖 AI Summary
This work addresses the insufficient synergy between personalized understanding and generation in existing unified multimodal models, which hinders effective cross-task reasoning. To overcome this limitation, we propose Sync-R1, a novel framework that introduces an explicit collaborative reasoning mechanism, jointly optimizing understanding and generation through end-to-end cooperative reinforcement learning under a unified reward system. The core contributions include the Sync-GRPO algorithm, a dynamic group scaling (DGS) strategy, and UnifyBench++, a more realistic and comprehensive benchmark for evaluation. Experimental results demonstrate that Sync-R1 significantly enhances cross-task reasoning capabilities and personalization performance on UnifyBench++, all without requiring complex cold-start procedures.
📝 Abstract
Unified Multimodal Models (UMMs) excel in general tasks but struggle to bridge the gap between personalized understanding and generation. Prior works largely rely on implicit token-level alignment via supervised fine-tuning, which fails to fully capture the potential synergy between comprehension and creation. In this work, we propose Sync-R1, an end-to-end reinforcement learning framework that jointly optimizes personalized understanding and generation within a single, explicit reasoning loop. Through this unified feedback process, Sync-R1 enables personalized comprehension to guide content creation, while the resulting generation quality reciprocally refines understanding within an integrated reward landscape. To efficiently orchestrate this dual-task synergy, we introduce Sync-GRPO, a reinforcement learning method utilizing an ensemble reward system. Furthermore, we propose Dynamic Group Scaling (DGS), which adaptively filters low-potential trajectories to reduce gradient variance and accelerate convergence. To better reflect real-world complexity, we introduce UnifyBench++, featuring denser textual descriptions and richer user contexts. Experimental results demonstrate that Sync-R1 achieves state-of-the-art performance, showcasing superior cross-task reasoning and robust personalization without requiring complex cold-start procedures. The code and the UnifyBench++ dataset will be released at: https://github.com/arctanxarc/UniCTokens.
Problem

Research questions and friction points this paper is trying to address.

personalized understanding
personalized generation
multimodal models
reasoning gap
synergy
Innovation

Methods, ideas, or system contributions that make the work stand out.

co-operative reinforcement learning
personalized reasoning
unified multimodal models
Dynamic Group Scaling
Sync-GRPO