🤖 AI Summary
Existing sequence recommendation methods relying on item IDs or lexical matching (e.g., BM25) struggle to capture semantic associations, exhibiting limited performance under cold-start and variable-length interaction scenarios. This paper proposes the first sequence recommendation framework integrating dense semantic retrieval with generative large language models (LLMs): it embeds semantic search into the generation pipeline and achieves, for the first time under low-resource constraints, joint fine-tuning of a 4-bit quantized LLaMA-3 with LoRA for end-to-end, semantic-driven next-item prediction. Our approach abandons conventional ID-based modeling and lexical dependencies, enabling genuine semantic-level sequence modeling. Evaluated on Amazon Beauty, Toys, and Sports datasets, it achieves up to a 52.8% improvement in Recall@5 over state-of-the-art baselines—including GPT4Rec—while demonstrating strong robustness to cold-start users.
📝 Abstract
We propose Generative Low-rank language model with Semantic Search (GLoSS), a generative recommendation framework that combines large language models with dense retrieval for sequential recommendation. Unlike prior methods such as GPT4Rec, which rely on lexical matching via BM25, GLoSS uses semantic search to retrieve relevant items beyond lexical matching. For query generation, we employ 4-bit quantized LlaMA-3 models fine-tuned with low-rank adaptation (LoRA), enabling efficient training and inference on modest hardware. We evaluate GLoSS on three real-world Amazon review datasets: Beauty, Toys, and Sports, and find that it achieves state-of-the-art performance. Compared to traditional ID-based baselines, GLoSS improves Recall@5 by 33.3%, 52.8%, and 15.2%, and NDCG@5 by 30.0%, 42.6%, and 16.1%, respectively. It also outperforms LLM-based recommenders such as P5, GPT4Rec, LlamaRec and E4SRec with Recall@5 gains of 4.3%, 22.8%, and 29.5%. Additionally, user segment evaluations show that GLoSS performs particularly well for cold-start users in the Amazon Toys and Sports datasets, and benefits from longer user histories in Amazon Beauty dataset, demonstrating robustness across different levels of interaction lengths.