Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias

📅 2025-03-09

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

Existing personalized image generation methods face two key challenges: (1) the difficult trade-off between text fidelity and subject consistency, and (2) overestimated performance due to training-data-dependent evaluation—leading to overfitting—and biases inherent in subjective human assessment. To address these, we propose: (1) an attractor-guided diffusion fine-tuning framework that enhances subject representation learning via attractor-driven feature filtering; (2) the first high-quality, dedicated test set, decoupling evaluation from training data; and (3) a comprehensive evaluation protocol integrating multi-dimensional automatic metrics (CLIPScore, DINOv2 similarity) with standardized human evaluation. On our new benchmark, our approach achieves a 23% reduction in FID, an 18% improvement in text alignment, and a strong 0.91 correlation between automatic metrics and human scores—significantly enhancing method reliability and reproducibility.

Technology Category

Application Category

📝 Abstract

Personalized image generation via text prompts has great potential to improve daily life and professional work by facilitating the creation of customized visual content. The aim of image personalization is to create images based on a user-provided subject while maintaining both consistency of the subject and flexibility to accommodate various textual descriptions of that subject. However, current methods face challenges in ensuring fidelity to the text prompt while not overfitting to the training data. In this work, we introduce a novel training pipeline that incorporates an attractor to filter out distractions in training images, allowing the model to focus on learning an effective representation of the personalized subject. Moreover, current evaluation methods struggle due to the lack of a dedicated test set. The evaluation set-up typically relies on the training data of the personalization task to compute text-image and image-image similarity scores, which, while useful, tend to overestimate performance. Although human evaluations are commonly used as an alternative, they often suffer from bias and inconsistency. To address these issues, we curate a diverse and high-quality test set with well-designed prompts. With this new benchmark, automatic evaluation metrics can reliably assess model performance

Problem

Research questions and friction points this paper is trying to address.

Addressing overfitting in personalized image generation models

Improving evaluation bias in current assessment methods

Creating reliable benchmarks for automatic performance evaluation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel training pipeline with attractor

Diverse high-quality test set

Automatic evaluation metrics benchmark

🔎 Similar Papers

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation