PatchDPO: Patch-level DPO for Finetuning-free Personalized Image Generation

๐Ÿ“… 2024-12-04
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 3
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
In tuning-free personalized image generation, local distortions severely degrade overall image quality, yet existing methods lack mechanisms to identify and rectify such fine-grained inconsistencies. Method: We propose the first patch-level preference optimization paradigm, extending Direct Preference Optimization (DPO) beyond image-level judgments. By integrating a pre-trained vision model with self-supervised patch-wise quality assessment, we design a patch-weighted loss function that enables granular quality control without test-time adaptation. Contribution/Results: Our approach achieves state-of-the-art performance on both single- and multi-object personalized image generation benchmarks, outperforming all existing tuning-free baselines. Critically, it requires no test-time fine-tuning while effectively mitigating localized artifactsโ€”marking the first successful application of preference learning at the patch level for generative modeling.

Technology Category

Application Category

๐Ÿ“ Abstract
Finetuning-free personalized image generation can synthesize customized images without test-time finetuning, attracting wide research interest owing to its high efficiency. Current finetuning-free methods simply adopt a single training stage with a simple image reconstruction task, and they typically generate low-quality images inconsistent with the reference images during test-time. To mitigate this problem, inspired by the recent DPO (i.e., direct preference optimization) technique, this work proposes an additional training stage to improve the pre-trained personalized generation models. However, traditional DPO only determines the overall superiority or inferiority of two samples, which is not suitable for personalized image generation because the generated images are commonly inconsistent with the reference images only in some local image patches. To tackle this problem, this work proposes PatchDPO that estimates the quality of image patches within each generated image and accordingly trains the model. To this end, PatchDPO first leverages the pre-trained vision model with a proposed self-supervised training method to estimate the patch quality. Next, PatchDPO adopts a weighted training approach to train the model with the estimated patch quality, which rewards the image patches with high quality while penalizing the image patches with low quality. Experiment results demonstrate that PatchDPO significantly improves the performance of multiple pre-trained personalized generation models, and achieves state-of-the-art performance on both single-object and multi-object personalized image generation. Our code is available at https://github.com/hqhQAQ/PatchDPO.
Problem

Research questions and friction points this paper is trying to address.

Improves pre-trained personalized image generation models
Addresses inconsistency in local image patches
Enhances image quality without test-time finetuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

PatchDPO improves pre-trained models with DPO
Self-supervised patch quality estimation via vision model
Weighted training rewards high-quality patches
๐Ÿ”Ž Similar Papers
No similar papers found.
Qihan Huang
Qihan Huang
PhD Student, Zhejiang University
L
Long Chan
Alibaba Group
J
Jinlong Liu
Alibaba Group
Wanggui He
Wanggui He
Researcher, Alibaba Group
ai
H
Hao Jiang
Alibaba Group
M
Mingli Song
Zhejiang University
J
Jie Song
Zhejiang University