UniRL-Zero: Reinforcement Learning on Unified Models with Joint Language Model and Diffusion Model Experts

📅 2025-10-20

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This study addresses three key challenges: weak reasoning capabilities in multimodal language models (MLLMs), low controllability in diffusion models (DMs), and the absence of a synergistic optimization mechanism between them. To this end, we propose UniRL-Zero—the first unified reinforcement learning (RL) framework for joint optimization of understanding and generation. Methodologically, we design six cross-modal RL scenarios and establish bidirectional reward signals linking language understanding and visual generation, enabling end-to-end co-training of MLLM and DM experts. Our contributions are threefold: (1) introducing the first benchmark for understanding-generation joint RL; (2) achieving significant improvements in cross-modal reasoning and controllable generation across diverse multimodal tasks; and (3) open-sourcing the codebase and training protocols to advance research in interactive multimodal learning.

Technology Category

Application Category

📝 Abstract

We present UniRL-Zero, a unified reinforcement learning (RL) framework that boosts, multimodal language model understanding and reasoning, diffusion model multimedia generation, and their beneficial interaction capabilities within a unified model. Our work defines six scenarios for unified model reinforcement learning, providing systematic baselines for reinforcement learning of unified understanding and generation model. Our code is available at https://github.com/G-U-N/UniRL.

Problem

Research questions and friction points this paper is trying to address.

Enhancing multimodal language model understanding and reasoning capabilities

Improving diffusion model multimedia generation quality

Facilitating beneficial interactions between understanding and generation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified reinforcement learning framework for multimodal models

Integrates language model reasoning with diffusion generation

Defines six scenarios for systematic RL baselines

🔎 Similar Papers

Mutual Enhancement of Large Language and Reinforcement Learning Models through Bi-Directional Feedback Mechanisms: A Case Study