🤖 AI Summary
Generative multi-behavior recommendation faces three key challenges: poor inference interpretability, high computational overhead, and insufficient multi-scale behavioral modeling. To address these, we propose GRACE, a generative recommendation framework. First, it introduces Chain-of-Thought Tokenization (CoTT), which integrates knowledge graph attributes to explicitly model interpretable reasoning paths. Second, it designs a trip-aware multi-granularity sparse attention mechanism that preserves long-range dependencies while reducing attention complexity by 48%. Third, it jointly leverages semantic tokenization, attribute encoding, and multi-scale sequential modeling for efficient generative recommendation. Extensive experiments on real-world Home and Electronics datasets demonstrate significant improvements: HR@10 increases by 106.9% and 22.1%, respectively—substantially outperforming state-of-the-art methods.
📝 Abstract
Generative models have recently demonstrated strong potential in multi-behavior recommendation systems, leveraging the expressive power of transformers and tokenization to generate personalized item sequences. However, their adoption is hindered by (1) the lack of explicit information for token reasoning, (2) high computational costs due to quadratic attention complexity and dense sequence representations after tokenization, and (3) limited multi-scale modeling over user history. In this work, we propose GRACE (Generative Recommendation via journey-aware sparse Attention on Chain-of-thought tokEnization), a novel generative framework for multi-behavior sequential recommendation. GRACE introduces a hybrid Chain-of-Thought (CoT) tokenization method that encodes user-item interactions with explicit attributes from product knowledge graphs (e.g., category, brand, price) over semantic tokenization, enabling interpretable and behavior-aligned generation. To address the inefficiency of standard attention, we design a Journey-Aware Sparse Attention (JSA) mechanism, which selectively attends to compressed, intra-, inter-, and current-context segments in the tokenized sequence. Experiments on two real-world datasets show that GRACE significantly outperforms state-of-the-art baselines, achieving up to +106.9% HR@10 and +106.7% NDCG@10 improvement over the state-of-the-art baseline on the Home domain, and +22.1% HR@10 on the Electronics domain. GRACE also reduces attention computation by up to 48% with long sequences.