PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

📅 2024-12-04

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 1

career value

179K/year

🤖 AI Summary

In tuning-free personalized image generation, local distortions severely degrade overall image quality, yet existing methods lack mechanisms to identify and rectify such fine-grained inconsistencies. Method: We propose the first patch-level preference optimization paradigm, extending Direct Preference Optimization (DPO) beyond image-level judgments. By integrating a pre-trained vision model with self-supervised patch-wise quality assessment, we design a patch-weighted loss function that enables granular quality control without test-time adaptation. Contribution/Results: Our approach achieves state-of-the-art performance on both single- and multi-object personalized image generation benchmarks, outperforming all existing tuning-free baselines. Critically, it requires no test-time fine-tuning while effectively mitigating localized artifacts—marking the first successful application of preference learning at the patch level for generative modeling.

Technology Category

Application Category

📝 Abstract

Finetuning-free personalized image generation can synthesize customized images without test-time finetuning, attracting wide research interest owing to its high efficiency. Current finetuning-free methods simply adopt a single training stage with a simple image reconstruction task, and they typically generate low-quality images inconsistent with the reference images during test-time. To mitigate this problem, inspired by the recent DPO (i.e., direct preference optimization) technique, this work proposes an additional training stage to improve the pre-trained personalized generation models. However, traditional DPO only determines the overall superiority or inferiority of two samples, which is not suitable for personalized image generation because the generated images are commonly inconsistent with the reference images only in some local image patches. To tackle this problem, this work proposes PatchDPO that estimates the quality of image patches within each generated image and accordingly trains the model. To this end, PatchDPO first leverages the pre-trained vision model with a proposed self-supervised training method to estimate the patch quality. Next, PatchDPO adopts a weighted training approach to train the model with the estimated patch quality, which rewards the image patches with high quality while penalizing the image patches with low quality. Experiment results demonstrate that PatchDPO significantly improves the performance of multiple pre-trained personalized generation models, and achieves state-of-the-art performance on both single-object and multi-object personalized image generation. Our code is available at https://github.com/hqhQAQ/PatchDPO.

Problem

Research questions and friction points this paper is trying to address.

Improves pre-trained personalized image generation models

Addresses inconsistency in local image patches

Enhances image quality without test-time finetuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

PatchDPO improves pre-trained models with DPO

Self-supervised patch quality estimation via vision model

Weighted training rewards high-quality patches

🔎 Similar Papers

No similar papers found.

Bosch Group

Renningen, BW, DE

Master Thesis AI-Based Keypoint Refinement for Autonomous Driving

Bosch Group

Hildesheim, NDS, DE

Research Engineer, Language - Personalization, Meta Superintelligence Labs