Generate, Not Recommend: Personalized Multimodal Content Generation

📅 2025-06-02

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Existing recommender systems merely filter pre-existing content and struggle to generate novel, personalized multimodal content—such as custom images—aligned with fine-grained user preferences. This paper introduces the first user-perception-aware paradigm for personalized multimodal content generation, moving beyond the conventional “retrieve-and-rank” framework. Methodologically, we leverage an arbitrary-to-arbitrary large multimodal model (LMM), integrating supervised fine-tuning with online reinforcement learning to enable dynamic modeling of both users’ historical behavior and latent interests. We evaluate our approach on two benchmark datasets and a user study. Results demonstrate that the generated images not only exhibit high alignment with users’ past interactions but also proactively uncover emerging interests, thereby significantly improving recommendation diversity and user satisfaction.

Technology Category

Application Category

📝 Abstract

To address the challenge of information overload from massive web contents, recommender systems are widely applied to retrieve and present personalized results for users. However, recommendation tasks are inherently constrained to filtering existing items and lack the ability to generate novel concepts, limiting their capacity to fully satisfy user demands and preferences. In this paper, we propose a new paradigm that goes beyond content filtering and selecting: directly generating personalized items in a multimodal form, such as images, tailored to individual users. To accomplish this, we leverage any-to-any Large Multimodal Models (LMMs) and train them in both supervised fine-tuning and online reinforcement learning strategy to equip them with the ability to yield tailored next items for users. Experiments on two benchmark datasets and user study confirm the efficacy of the proposed method. Notably, the generated images not only align well with users' historical preferences but also exhibit relevance to their potential future interests.

Problem

Research questions and friction points this paper is trying to address.

Overcoming information overload with personalized content generation

Generating novel multimodal items beyond recommendation filtering

Creating user-tailored images matching historical and future interests

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generate personalized multimodal content directly

Use any-to-any Large Multimodal Models (LMMs)

Train with fine-tuning and reinforcement learning

🔎 Similar Papers

No similar papers found.