🤖 AI Summary
To address the truncation of users’ long-term behavioral sequences in LLM-based sequential recommendation caused by context-length limitations, this paper proposes a multi-granularity chunking compression framework. It compresses item titles into compact item patches and dynamically aggregates them into session patches based on temporal proximity, enabling efficient modeling of ultra-long sequences. We introduce a novel dynamic hierarchical compression paradigm, integrating two-stage patch pretraining—driven by contrastive learning—with instruction fine-tuning to endow LLMs with cross-granularity sequential understanding capabilities. Evaluated on the Goodreads dataset, our method achieves a 32% improvement in Hit Rate@20 while consuming only 7% of the original token count, substantially reducing computational overhead and enabling real-time recommendation over extremely long user behavioral sequences.
📝 Abstract
Large Language Models for sequential recommendation (LLM4SR), which transform user-item interactions into language modeling, have shown promising results. However, due to the limitations of context window size and the computational costs associated with Large Language Models (LLMs), current approaches primarily truncate user history by only considering the textual information of items from the most recent interactions in the input prompt. This truncation fails to fully capture the long-term behavioral patterns of users. To address this, we propose a multi-grained patching framework -- PatchRec. It compresses the textual tokens of an item title into a compact item patch, and further compresses multiple item patches into a denser session patch, with earlier interactions being compressed to a greater degree. The framework consists of two stages: (1) Patch Pre-training, which familiarizes LLMs with item-level compression patterns, and (2) Patch Fine-tuning, which teaches LLMs to model sequences at multiple granularities. Through this simple yet effective approach, empirical results demonstrate that PatchRec outperforms existing methods, achieving significant performance gains with fewer tokens fed to the LLM. Specifically, PatchRec shows up to a 32% improvement in HR@20 on the Goodreads dataset over uncompressed baseline, while using only 7% of the tokens. This multi-grained sequence modeling paradigm, with an adjustable compression ratio, enables LLMs to be efficiently deployed in real-world recommendation systems that handle extremely long user behavior sequences.