π€ AI Summary
This work addresses the challenge of modeling individual aesthetic preferences in zero-shot image aesthetic assessment, where user historical ratings are unavailable. It introduces user profiles as contextual signals and proposes a profile-guided personalization paradigm: building upon a frozen multimodal large language model, the method incorporates a profile-aware selective fusion module to enable controllable integration of visual and textual information, followed by profile-conditioned inference for personalized prediction. Notably, the approach requires no fine-tuning and achieves competitive zero-shot performance across multiple PIAA benchmarks. Its robustness persists even with coarse-grained user profiles, demonstrating the efficacy and potential of leveraging user profiles for zero-shot personalized aesthetic evaluation.
π Abstract
Personalized image aesthetics assessment (PIAA) aims to predict an individual user's subjective rating of an image, which requires modeling user-specific aesthetic preferences. Existing methods rely on historical user ratings for this modeling and therefore struggle when such data are unavailable. We address this zero-shot setting by using user profiles as contextual signals for personalization and adopting a profile-based personalization paradigm. We introduce P-MLLM, a profile-aware multimodal LLM that augments a frozen LLM with selective fusion modules for controlled visual integration. These modules selectively integrate visual information into the model's evolving hidden states during profile-conditioned reasoning, allowing visual information to be incorporated in a profile-aware manner. Experiments on recent PIAA benchmarks show that P-MLLM achieves competitive zero-shot performance and remains effective even with coarse profile information, highlighting the potential of profile-based personalization for zero-shot PIAA.