๐ค AI Summary
Existing large language model (LLM)-based explainable recommendation methods struggle to effectively integrate multimodal information and align collaborative signals with the LLMโs semantic space, resulting in insufficient explainability. To address this, this work proposes a multimodal retrieval-augmented explainable recommendation framework that generates multimodal retrieval paths via heuristic search and introduces a lightweight collaborative adapter to encode userโitem interaction subgraphs into soft prompts for the LLM. This design enables effective alignment between graph-structured collaborative data and the LLMโs semantic space. Extensive experiments on multiple benchmark datasets demonstrate significant improvements in both recommendation accuracy and explanation quality. The code and data are publicly released.
๐ Abstract
Explainable recommendations help improve the transparency and credibility of recommendation systems, and play an important role in personalized recommendation scenarios. At present, methods for explainable recommendation based on large language models(LLMs) often consider introducing collaborative information to enhance the personalization and accuracy of the model, but ignore the multimodal information in the recommendation dataset; In addition, collaborative information needs to be aligned with the semantic space of LLM. Introducing collaborative signals through retrieval paths is a good choice, but most of the existing retrieval path collection schemes use the existing Explainable GNN algorithms. Although these methods are effective, they are relatively unexplainable and not be suitable for the recommendation field.
To address the above challenges, we propose MMP-Refer, a framework using \textbf{M}ulti\textbf{M}odal Retrieval \textbf{P}aths with \textbf{Re}trieval-augmented LLM \textbf{F}or \textbf{E}xplainable \textbf{R}ecommendation. We use a sequential recommendation model based on joint residual coding to obtain multimodal embeddings, and design a heuristic search algorithm to obtain retrieval paths by multimodal embeddings; In the generation phase, we integrated a trainable lightweight collaborative adapter to map the graph encoding of interaction subgraphs to the semantic space of the LLM, as soft prompts to enhance the understanding of interaction information by the LLM. Extensive experiments have demonstrated the effectiveness of our approach. Codes and data are available at https://github.com/pxcstart/MMP-Refer.