User-Aware Conditional Generative Total Correlation Learning for Multi-Modal Recommendation

📅 2026-04-03

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This work addresses two key limitations in existing multimodal recommendation methods: their neglect of individual user differences in content relevance perception and their inability to capture high-order dependencies among modalities. To overcome these challenges, the authors propose the GTC framework, which first employs a user-conditional generative diffusion model to enable personalized filtering of multimodal content features. Subsequently, it explicitly models the joint multimodal dependencies under user perception by optimizing a lower bound on the total correlation of cross-modal representations, thereby transcending the constraints of conventional pairwise contrastive learning. Extensive experiments demonstrate that GTC significantly outperforms state-of-the-art baselines on standard benchmarks, achieving up to a 28.30% improvement in NDCG@5. Ablation studies further confirm the effectiveness of each proposed component.

Technology Category

Application Category

📝 Abstract

Multi-modal recommendation (MMR) enriches item representations by introducing item content, e.g., visual and textual descriptions, to improve upon interaction-only recommenders. The success of MMR hinges on aligning these content modalities with user preferences derived from interaction data, yet dominant practices based on disentangling modality-invariant preference-driving signals from modality-specific preference-irrelevant noises are flawed. First, they assume a one-size-fits-all relevance of item content to user preferences for all users, which contradicts the user-conditional fact of preferences. Second, they optimize pairwise contrastive losses separately toward cross-modal alignment, systematically ignoring higher-order dependencies inherent when multiple content modalities jointly influence user choices. In this paper, we introduce GTC, a conditional Generative Total Correlation learning framework. We employ an interaction-guided diffusion model to perform user-aware content feature filtering, preserving only personalized features relevant to each individual user. Furthermore, to capture complete cross-modal dependencies, we optimize a tractable lower bound of the total correlation of item representations across all modalities. Experiments on standard MMR benchmarks show GTC consistently outperforms state-of-the-art, with gains of up to 28.30% in NDCG@5. Ablation studies validate both conditional preference-driven feature filtering and total correlation optimization, confirming the ability of GTC to model user-conditional relationships in MMR tasks. The code is available at: https://github.com/jingdu-cs/GTC.

Problem

Research questions and friction points this paper is trying to address.

multi-modal recommendation

user-aware preference

cross-modal alignment

total correlation

conditional generative learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

conditional generative modeling

total correlation

user-aware feature filtering