Learning to Credit the Right Steps: Objective-aware Process Optimization for Visual Generation

📅 2026-04-21

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the limitations of existing GRPO-based visual generation models, which employ coarse-grained, static credit assignment under multi-objective rewards and fail to account for the dynamic roles of individual denoising steps in the diffusion process. To overcome this, we propose the Objective-aware Trajectory Credit Assignment (OTCA) framework, which jointly models timesteps and multi-objective rewards for the first time. OTCA enables fine-grained, temporally aware dynamic credit allocation through trajectory-level credit decomposition and adaptive weighting. By moving beyond the conventional GRPO reliance on a unified scalar reward, our method significantly enhances generation quality in both image and video synthesis tasks, achieving superior performance across multiple evaluation metrics.

Technology Category

Application Category

📝 Abstract

Reinforcement learning, particularly Group Relative Policy Optimization (GRPO), has emerged as an effective framework for post-training visual generative models with human preference signals. However, its effectiveness is fundamentally limited by coarse reward credit assignment. In modern visual generation, multiple reward models are often used to capture heterogeneous objectives, such as visual quality, motion consistency, and text alignment. Existing GRPO pipelines typically collapse these rewards into a single static scalar and propagate it uniformly across the entire diffusion trajectory. This design ignores the stage-specific roles of different denoising steps and produces mistimed or incompatible optimization signals. To address this issue, we propose Objective-aware Trajectory Credit Assignment (OTCA), a structured framework for fine-grained GRPO training. OTCA consists of two key components. Trajectory-Level Credit Decomposition estimates the relative importance of different denoising steps. Multi-Objective Credit Allocation adaptively weights and combines multiple reward signals throughout the denoising process. By jointly modeling temporal credit and objective-level credit, OTCA converts coarse reward supervision into a structured, timestep-aware training signal that better matches the iterative nature of diffusion-based generation. Extensive experiments show that OTCA consistently improves both image and video generation quality across evaluation metrics.

Problem

Research questions and friction points this paper is trying to address.

reward credit assignment

multi-objective optimization

diffusion models

visual generation

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Objective-aware Credit Assignment

Group Relative Policy Optimization

Diffusion Models