🤖 AI Summary
Existing generative models struggle to model individualized user preferences due to the scarcity of fine-grained, real-world preference annotations. Method: We introduce the first large-scale in-the-wild image generation interaction dataset—comprising 570K LoRA-customized, text-prompted, image-based preference triplets—and propose a novel paradigm for personalizing generative models grounded in authentic user interactions. We design an end-to-end latent-weight-space editing framework that jointly integrates behavioral log modeling, preference alignment, cross-modal retrieval, and vision-language understanding to enable personalized image retrieval and generation recommendation. Contribution/Results: Experiments demonstrate significant improvements in preference alignment accuracy and generation recommendation quality, validating both the utility of our dataset and the generalizability of the proposed paradigm. The framework advances personalized generative modeling by bridging real-user interaction signals with latent-space adaptation in multimodal foundation models.
📝 Abstract
We introduce ImageGem, a dataset for studying generative models that understand fine-grained individual preferences. We posit that a key challenge hindering the development of such a generative model is the lack of in-the-wild and fine-grained user preference annotations. Our dataset features real-world interaction data from 57K users, who collectively have built 242K customized LoRAs, written 3M text prompts, and created 5M generated images. With user preference annotations from our dataset, we were able to train better preference alignment models. In addition, leveraging individual user preference, we investigated the performance of retrieval models and a vision-language model on personalized image retrieval and generative model recommendation. Finally, we propose an end-to-end framework for editing customized diffusion models in a latent weight space to align with individual user preferences. Our results demonstrate that the ImageGem dataset enables, for the first time, a new paradigm for generative model personalization.