🤖 AI Summary
To address the insufficient item representation caused by sparse user interactions for new items in cold-start recommendation scenarios, this paper proposes the Multimodal Adaptive Mixture-of-Experts Network (MAMEX). MAMEX employs a learnable gating mechanism to dynamically weight heterogeneous multimodal features—such as image and text embeddings—thereby capturing complex cross-modal dependencies. It further integrates modality-specific experts within a Mixture-of-Experts (MoE) architecture to ensure robust inference under partial modality absence. Unlike static fusion strategies (e.g., concatenation or average pooling), MAMEX enables fine-grained, adaptive multimodal representation fusion. Extensive experiments on multiple public benchmark datasets demonstrate significant improvements in standard recommendation metrics—including Recall@K and NDCG—validating both its effectiveness and generalizability. The source code is publicly available.
📝 Abstract
Recommendation systems have faced significant challenges in cold-start scenarios, where new items with a limited history of interaction need to be effectively recommended to users. Though multimodal data (e.g., images, text, audio, etc.) offer rich information to address this issue, existing approaches often employ simplistic integration methods such as concatenation, average pooling, or fixed weighting schemes, which fail to capture the complex relationships between modalities. Our study proposes a novel Mixture of Experts (MoE) framework for multimodal cold-start recommendation, named MAMEX, which dynamically leverages latent representation from different modalities. MAMEX utilizes modality-specific expert networks and introduces a learnable gating mechanism that adaptively weights the contribution of each modality based on its content characteristics. This approach enables MAMEX to emphasize the most informative modalities for each item while maintaining robustness when certain modalities are less relevant or missing. Extensive experiments on benchmark datasets show that MAMEX outperforms state-of-the-art methods in cold-start scenarios, with superior accuracy and adaptability. For reproducibility, the code has been made available on Github https://github.com/L2R-UET/MAMEX.