ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing generative models struggle to model individualized user preferences due to the scarcity of fine-grained, real-world preference annotations. Method: We introduce the first large-scale in-the-wild image generation interaction dataset—comprising 570K LoRA-customized, text-prompted, image-based preference triplets—and propose a novel paradigm for personalizing generative models grounded in authentic user interactions. We design an end-to-end latent-weight-space editing framework that jointly integrates behavioral log modeling, preference alignment, cross-modal retrieval, and vision-language understanding to enable personalized image retrieval and generation recommendation. Contribution/Results: Experiments demonstrate significant improvements in preference alignment accuracy and generation recommendation quality, validating both the utility of our dataset and the generalizability of the proposed paradigm. The framework advances personalized generative modeling by bridging real-user interaction signals with latent-space adaptation in multimodal foundation models.

Technology Category

Application Category

📝 Abstract
We introduce ImageGem, a dataset for studying generative models that understand fine-grained individual preferences. We posit that a key challenge hindering the development of such a generative model is the lack of in-the-wild and fine-grained user preference annotations. Our dataset features real-world interaction data from 57K users, who collectively have built 242K customized LoRAs, written 3M text prompts, and created 5M generated images. With user preference annotations from our dataset, we were able to train better preference alignment models. In addition, leveraging individual user preference, we investigated the performance of retrieval models and a vision-language model on personalized image retrieval and generative model recommendation. Finally, we propose an end-to-end framework for editing customized diffusion models in a latent weight space to align with individual user preferences. Our results demonstrate that the ImageGem dataset enables, for the first time, a new paradigm for generative model personalization.
Problem

Research questions and friction points this paper is trying to address.

Lack of in-the-wild fine-grained user preference annotations
Personalized image retrieval and generative model recommendation
Aligning customized diffusion models with individual user preferences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset with real-world user interaction data
Framework for editing customized diffusion models
Training better preference alignment models
🔎 Similar Papers
No similar papers found.
Y
Yuanhe Guo
NYU
L
Linxi Xie
NYU
Zhuoran Chen
Zhuoran Chen
New York University Shanghai
RoboticComputer Vision
K
Kangrui Yu
NYU
R
Ryan Po
Stanford
Guandao Yang
Guandao Yang
Apple
Machine LearningComputer VisionComputer Graphics
G
Gordon Wetztein
Stanford
H
Hongyi Wen
NYU