Uni-Synergy: Bridging Understanding and Generation for Personalized Reasoning via Co-operative Reinforcement Learning

📅 2026-05-11

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the insufficient synergy between personalized understanding and generation in existing unified multimodal models, which hinders effective cross-task reasoning. To overcome this limitation, we propose Sync-R1, a novel framework that introduces an explicit collaborative reasoning mechanism, jointly optimizing understanding and generation through end-to-end cooperative reinforcement learning under a unified reward system. The core contributions include the Sync-GRPO algorithm, a dynamic group scaling (DGS) strategy, and UnifyBench++, a more realistic and comprehensive benchmark for evaluation. Experimental results demonstrate that Sync-R1 significantly enhances cross-task reasoning capabilities and personalization performance on UnifyBench++, all without requiring complex cold-start procedures.

📝 Abstract

Unified Multimodal Models (UMMs) excel in general tasks but struggle to bridge the gap between personalized understanding and generation. Prior works largely rely on implicit token-level alignment via supervised fine-tuning, which fails to fully capture the potential synergy between comprehension and creation. In this work, we propose Sync-R1, an end-to-end reinforcement learning framework that jointly optimizes personalized understanding and generation within a single, explicit reasoning loop. Through this unified feedback process, Sync-R1 enables personalized comprehension to guide content creation, while the resulting generation quality reciprocally refines understanding within an integrated reward landscape. To efficiently orchestrate this dual-task synergy, we introduce Sync-GRPO, a reinforcement learning method utilizing an ensemble reward system. Furthermore, we propose Dynamic Group Scaling (DGS), which adaptively filters low-potential trajectories to reduce gradient variance and accelerate convergence. To better reflect real-world complexity, we introduce UnifyBench++, featuring denser textual descriptions and richer user contexts. Experimental results demonstrate that Sync-R1 achieves state-of-the-art performance, showcasing superior cross-task reasoning and robust personalization without requiring complex cold-start procedures. The code and the UnifyBench++ dataset will be released at: https://github.com/arctanxarc/UniCTokens.

Problem

Research questions and friction points this paper is trying to address.

personalized understanding

personalized generation

multimodal models

reasoning gap

synergy

Innovation

Methods, ideas, or system contributions that make the work stand out.

co-operative reinforcement learning

personalized reasoning

unified multimodal models