🤖 AI Summary
To address the limited representational capacity of sequential recommendation models caused by sparse semantic context, this paper proposes LaMAR—a data-centric framework that pioneers the use of large language models (LLMs) as semantic enhancers. LaMAR automatically generates multi-dimensional semantic signals (e.g., usage scenarios, item intents, and topical summaries) for user behavior sequences under few-shot settings. It integrates item metadata into controllable prompt engineering to ensure high novelty and diversity in signal generation, and seamlessly embeds these signals into mainstream sequential recommendation architectures. Extensive experiments on multiple benchmark datasets demonstrate that LaMAR significantly improves recommendation performance—achieving an average 12.7% gain in Recall@20. Ablation studies confirm that the generated semantic signals effectively strengthen downstream models’ semantic understanding and generalization capability.
📝 Abstract
Large Language Models (LLMs) excel at capturing latent semantics and contextual relationships across diverse modalities. However, in modeling user behavior from sequential interaction data, performance often suffers when such semantic context is limited or absent. We introduce LaMAR, a LLM-driven semantic enrichment framework designed to enrich such sequences automatically. LaMAR leverages LLMs in a few-shot setting to generate auxiliary contextual signals by inferring latent semantic aspects of a user's intent and item relationships from existing metadata. These generated signals, such as inferred usage scenarios, item intents, or thematic summaries, augment the original sequences with greater contextual depth. We demonstrate the utility of this generated resource by integrating it into benchmark sequential modeling tasks, where it consistently improves performance. Further analysis shows that LLM-generated signals exhibit high semantic novelty and diversity, enhancing the representational capacity of the downstream models. This work represents a new data-centric paradigm where LLMs serve as intelligent context generators, contributing a new method for the semi-automatic creation of training data and language resources.