PaCo-RL: Advancing Reinforcement Learning for Consistent Image Generation with Pairwise Reward Modeling

📅 2025-12-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Modeling cross-image identity, style, and logical consistency in image generation faces dual challenges: scarcity of annotated training data and difficulty in modeling human perceptual judgments. This paper proposes PaCo-RL, the first reinforcement learning–based consistency-aware image generation framework that requires no human-annotated supervision. Our approach comprises three key components: (1) constructing a large-scale, automatically generated subgraph pairing dataset; (2) designing a task-aware, instruction-driven generative reward model integrated with chain-of-thought reasoning; and (3) introducing resolution-decoupled optimization and a log-tamed multi-reward balancing mechanism. Evaluated across multiple consistency subtasks—including identity preservation, style coherence, and logical plausibility—PaCo-RL achieves state-of-the-art performance, significantly improving alignment with human evaluations while reducing training cost and enhancing convergence stability.

Technology Category

Application Category

📝 Abstract
Consistent image generation requires faithfully preserving identities, styles, and logical coherence across multiple images, which is essential for applications such as storytelling and character design. Supervised training approaches struggle with this task due to the lack of large-scale datasets capturing visual consistency and the complexity of modeling human perceptual preferences. In this paper, we argue that reinforcement learning (RL) offers a promising alternative by enabling models to learn complex and subjective visual criteria in a data-free manner. To achieve this, we introduce PaCo-RL, a comprehensive framework that combines a specialized consistency reward model with an efficient RL algorithm. The first component, PaCo-Reward, is a pairwise consistency evaluator trained on a large-scale dataset constructed via automated sub-figure pairing. It evaluates consistency through a generative, autoregressive scoring mechanism enhanced by task-aware instructions and CoT reasons. The second component, PaCo-GRPO, leverages a novel resolution-decoupled optimization strategy to substantially reduce RL cost, alongside a log-tamed multi-reward aggregation mechanism that ensures balanced and stable reward optimization. Extensive experiments across the two representative subtasks show that PaCo-Reward significantly improves alignment with human perceptions of visual consistency, and PaCo-GRPO achieves state-of-the-art consistency performance with improved training efficiency and stability. Together, these results highlight the promise of PaCo-RL as a practical and scalable solution for consistent image generation. The project page is available at https://x-gengroup.github.io/HomePage_PaCo-RL/.
Problem

Research questions and friction points this paper is trying to address.

Develops a reinforcement learning framework for consistent image generation
Addresses lack of datasets and complex human preferences in visual consistency
Introduces pairwise reward modeling and efficient optimization to improve training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pairwise reward model trained on automated sub-figure pairing dataset
Resolution-decoupled optimization strategy reduces RL training cost
Log-tamed multi-reward aggregation ensures balanced stable optimization
🔎 Similar Papers
No similar papers found.