Discrete Preference Learning for Personalized Multimodal Generation

📅 2026-04-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

193K/year
🤖 AI Summary
Existing personalized generative models struggle to accurately capture user preferences and are typically confined to unimodal outputs, failing to meet the demands of real-world multimodal interaction scenarios. This work proposes DPPMG, a two-stage framework that first employs modality-specific graph neural networks to learn and quantize user preferences into discrete tokens, which are then injected into both text and image generators. A cross-modal consistent personalized reward mechanism is further designed to enable reinforcement fine-tuning. DPPMG establishes the first discrete preference learning paradigm tailored for personalized multimodal generation, effectively bridging the gap between continuous preference modeling and the discrete inputs required by generative models. Experimental results on two real-world datasets demonstrate that DPPMG significantly enhances both the personalization quality and cross-modal consistency of generated content.

Technology Category

Application Category

📝 Abstract
The emergence of generative models enables the creation of texts and images tailored to users' preferences. Existing personalized generative models have two critical limitations: lacking a dedicated paradigm for accurate preference modeling, and generating unimodal content despite real-world multimodal-driven user interactions. Therefore, we propose personalized multimodal generation, which captures modal-specific preferences via a dedicated preference model from multimodal interactions, and then feeds them into downstream generators for personalized multimodal content. However, this task presents two challenges: (1) Gap between continuous preferences from dedicated modeling and discrete token inputs intrinsic to generator architectures; (2) Potential inconsistency between generated images and texts. To tackle these, we present a two-stage framework called Discrete Preference learning for Personalized Multimodal Generation (DPPMG). In the first stage, to accurately learn discrete modal-specific preferences, we introduce a modal-specific graph neural network (a dedicated preference model) to learn users' modal-specific preferences, which preferences are then quantized into discrete preference tokens. In the second stage, the discrete modal-specific preference tokens are injected into downstream text and image generators. To further enhance cross-modal consistency while preserving personalization, we design a cross-modal consistent and personalized reward to fine-tune token-associated parameters. Extensive experiments on two real-world datasets demonstrate the effectiveness of our model in generating personalized and consistent multimodal content.
Problem

Research questions and friction points this paper is trying to address.

personalized multimodal generation
discrete preference learning
modal-specific preferences
cross-modal consistency
generative models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Discrete Preference Learning
Personalized Multimodal Generation
Modal-specific Preference
Cross-modal Consistency
Graph Neural Network
🔎 Similar Papers
No similar papers found.
Yuting Zhang
Yuting Zhang
HKUST(GZ)
rPPGComputer Vision
Ying Sun
Ying Sun
The Hong Kong University of Science and Technology (Guangzhou)
Data MiningMachine Learning
Dazhong Shen
Dazhong Shen
Nanjing University of Aeronautics and Astronautics
Data MiningGenerative AI
Z
Ziwei Xie
OPPO Research Institute, Shenzhen, Guangdong, China
F
Feng Liu
OPPO Research Institute, Shenzhen, Guangdong, China
C
Changwang Zhang
OPPO Research Institute, Shenzhen, Guangdong, China
X
Xiang Liu
OPPO Internet Services System, Shenzhen, Guangdong, China
J
Jun Wang
OPPO Research Institute, Shenzhen, Guangdong, China
Hui Xiong
Hui Xiong
Senior Scientist, Candela Corporation
Ultrafast dynamicsatomic molecular physicsfree electron laser