Distribution-Guided Auto-Encoder for User Multimodal Interest Cross Fusion

📅 2025-08-20

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Traditional ID-based recommendation suffers from data sparsity, while existing multimodal approaches predominantly adopt early fusion, neglecting the dynamic contextual influence of user behavior sequences and thus failing to model behavior-driven evolution of multimodal interests. To address this, we propose a behavior-level multimodal interest cross-fusion framework that innovatively integrates a distribution-guided autoencoder mechanism for fine-grained, context-aware cross-modal feature fusion. Specifically, our method jointly models modality-specific features (e.g., text and images) while explicitly capturing how user behavioral patterns dynamically modulate multimodal interest representations. Extensive experiments on multiple benchmark datasets demonstrate that our approach significantly outperforms state-of-the-art baselines in both recommendation accuracy and generalization capability.

Technology Category

Application Category

📝 Abstract

Traditional recommendation methods rely on correlating the embedding vectors of item IDs to capture implicit collaborative filtering signals to model the user's interest in the target item. Consequently, traditional ID-based methods often encounter data sparsity problems stemming from the sparse nature of ID features. To alleviate the problem of item ID sparsity, recommendation models incorporate multimodal item information to enhance recommendation accuracy. However, existing multimodal recommendation methods typically employ early fusion approaches, which focus primarily on combining text and image features, while neglecting the contextual influence of user behavior sequences. This oversight prevents dynamic adaptation of multimodal interest representations based on behavioral patterns, consequently restricting the model's capacity to effectively capture user multimodal interests. Therefore, this paper proposes the Distribution-Guided Multimodal-Interest Auto-Encoder (DMAE), which achieves the cross fusion of user multimodal interest at the behavioral level.Ultimately, extensive experiments demonstrate the superiority of DMAE.

Problem

Research questions and friction points this paper is trying to address.

Addresses data sparsity in ID-based recommendation methods

Overcomes limitations of early multimodal fusion ignoring user behavior

Enables dynamic multimodal interest adaptation from behavioral patterns

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distribution-guided autoencoder for multimodal fusion

Cross-fusion at behavioral level for dynamic adaptation

Addresses data sparsity through multimodal interest representation

🔎 Similar Papers

No similar papers found.