🤖 AI Summary
Existing personalized generative models perform well on single-subject generation but suffer from weak subject consistency and poor text controllability when extended to multi-subject settings—primarily due to the scarcity of high-quality multi-subject data and effective training strategies. To address this, we propose: (1) the first multi-subject evaluation benchmark covering three dimensions—identity, pose, and scene—with seven curated subsets; (2) a paired subject-consistency reward mechanism integrated with generic text-based rewards, optimized via reinforcement learning; and (3) a scalable pipeline for synthesizing multi-subject training data. Experiments demonstrate that our method significantly outperforms baselines in both subject fidelity and text alignment, achieving comprehensive performance gains across all metrics on the new benchmark.
📝 Abstract
Personalized generation models for a single subject have demonstrated remarkable effectiveness, highlighting their significant potential. However, when extended to multiple subjects, existing models often exhibit degraded performance, particularly in maintaining subject consistency and adhering to textual prompts. We attribute these limitations to the absence of high-quality multi-subject datasets and refined post-training strategies. To address these challenges, we propose a scalable multi-subject data generation pipeline that leverages powerful single-subject generation models to construct diverse and high-quality multi-subject training data. Through this dataset, we first enable single-subject personalization models to acquire knowledge of synthesizing multi-image and multi-subject scenarios. Furthermore, to enhance both subject consistency and text controllability, we design a set of Pairwise Subject-Consistency Rewards and general-purpose rewards, which are incorporated into a refined reinforcement learning stage. To comprehensively evaluate multi-subject personalization, we introduce a new benchmark that assesses model performance using seven subsets across three dimensions. Extensive experiments demonstrate the effectiveness of our approach in advancing multi-subject personalized image generation. Github Link: https://github.com/wang-shulei/PSR