OneReward: Unified Mask-Guided Image Generation via Multi-Task Human Preference Learning

📅 2025-08-28

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Current multi-task image editing approaches rely on fragmented, task-specific reward models requiring separate supervised fine-tuning, limiting generalization and efficiency. Method: This paper introduces OneReward—the first multi-task reinforcement learning framework built upon a unified vision-language model (VLM) as a single, homogeneous reward function. It eliminates task-specific fine-tuning and jointly optimizes diverse editing tasks—including inpainting, outpainting, object removal, and text rendering—using precise binary masks to localize edits. Multi-task preference learning is conducted directly on pretrained foundation models, bypassing task-specific adaptation. Contribution/Results: OneReward significantly improves cross-task consistency and training efficiency. Its derived model, Seedream 3.0 Fill, achieves state-of-the-art performance across multiple objective and subjective metrics, outperforming Ideogram, Adobe Photoshop, and FLUX Fill[Pro] in both generation quality and task-agnostic coherence.

Technology Category

Application Category

📝 Abstract

In this paper, we introduce OneReward, a unified reinforcement learning framework that enhances the model's generative capabilities across multiple tasks under different evaluation criteria using only extit{One Reward} model. By employing a single vision-language model (VLM) as the generative reward model, which can distinguish the winner and loser for a given task and a given evaluation criterion, it can be effectively applied to multi-task generation models, particularly in contexts with varied data and diverse task objectives. We utilize OneReward for mask-guided image generation, which can be further divided into several sub-tasks such as image fill, image extend, object removal, and text rendering, involving a binary mask as the edit area. Although these domain-specific tasks share same conditioning paradigm, they differ significantly in underlying data distributions and evaluation metrics. Existing methods often rely on task-specific supervised fine-tuning (SFT), which limits generalization and training efficiency. Building on OneReward, we develop Seedream 3.0 Fill, a mask-guided generation model trained via multi-task reinforcement learning directly on a pre-trained base model, eliminating the need for task-specific SFT. Experimental results demonstrate that our unified edit model consistently outperforms both commercial and open-source competitors, such as Ideogram, Adobe Photoshop, and FLUX Fill [Pro], across multiple evaluation dimensions. Code and model are available at: https://one-reward.github.io

Problem

Research questions and friction points this paper is trying to address.

Unified reward model for multi-task image generation under diverse criteria

Eliminating task-specific supervised fine-tuning in mask-guided generation

Improving generalization across image editing tasks with varied data distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified reinforcement learning with one reward model

Single vision-language model for multi-task generation

Mask-guided image generation without task-specific fine-tuning

🔎 Similar Papers

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation