MMREC: LLM Based Multi-Modal Recommender System

📅 2024-08-08
🏛️ International Workshop on Semantic and Social Media Adaptation and Personalization
📈 Citations: 13
Influential: 2
📄 PDF
🤖 AI Summary
To address the challenges posed by explosive growth in multimodal content for recommendation systems, this paper proposes an LLM-driven unified multimodal latent space modeling framework. The method jointly encodes textual and visual semantics through LLM-guided cross-modal feature extraction, co-embedding learning, and unified latent space projection, enabling semantic-level alignment and collaborative understanding—thereby overcoming representational limitations inherent in conventional unimodal recommendation approaches. Extensive experiments on multiple public benchmarks demonstrate substantial improvements in recommendation discriminability: Recall@10 and NDCG@10 increase by 5.2%–9.7% over current state-of-the-art methods, validating the framework’s effectiveness in both accuracy and contextual relevance.

Technology Category

Application Category

📝 Abstract
The importance of recommender systems is growing rapidly due to the exponential increase in the volume of content generated daily. This surge in content presents unique challenges for designing effective recommender systems. Key among these challenges is the need to effectively leverage the vast amounts of natural language data and images that represent user preferences. This paper presents a novel approach to enhancing recommender systems by leveraging Large Language Models (LLMs) and deep learning techniques. The proposed framework aims to improve the accuracy and relevance of recommendations by incorporating multi-modal information processing and by the use of unified latent space representation. The study explores the potential of LLMs to better understand and utilize natural language data in recommendation contexts, addressing the limitations of previous methods. The framework efficiently extracts and integrates text and image information through LLMs, unifying diverse modalities in a latent space to simplify the learning process for the ranking model. Experimental results demonstrate the enhanced discriminative power of the model when utilizing multi-modal information. This research contributes to the evolving field of recommender systems by showcasing the potential of LLMs and multi-modal data integration to create more personalized and contextually relevant recommendations.
Problem

Research questions and friction points this paper is trying to address.

Enhancing recommender systems with multi-modal data processing
Improving recommendation accuracy using LLMs and deep learning
Unifying text and image data in a latent space
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs for natural language data processing
Multi-modal information integration
Unified latent space representation
🔎 Similar Papers
No similar papers found.
J
Jiahao Tian
Georgia Institute of Technology, Atlanta, Georgia, USA
J
Jinman Zhao
University of Toronto, Toronto, Ontario, Canada
Z
Zhenkai Wang
The University of Texas at Austin, Austin, Texas, USA
Z
Zhicheng Ding
Columbia University, New York, NY, USA