GRACE: Generative Recommendation via Journey-Aware Sparse Attention on Chain-of-Thought Tokenization

📅 2025-07-19

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

Generative multi-behavior recommendation faces three key challenges: poor inference interpretability, high computational overhead, and insufficient multi-scale behavioral modeling. To address these, we propose GRACE, a generative recommendation framework. First, it introduces Chain-of-Thought Tokenization (CoTT), which integrates knowledge graph attributes to explicitly model interpretable reasoning paths. Second, it designs a trip-aware multi-granularity sparse attention mechanism that preserves long-range dependencies while reducing attention complexity by 48%. Third, it jointly leverages semantic tokenization, attribute encoding, and multi-scale sequential modeling for efficient generative recommendation. Extensive experiments on real-world Home and Electronics datasets demonstrate significant improvements: HR@10 increases by 106.9% and 22.1%, respectively—substantially outperforming state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

Generative models have recently demonstrated strong potential in multi-behavior recommendation systems, leveraging the expressive power of transformers and tokenization to generate personalized item sequences. However, their adoption is hindered by (1) the lack of explicit information for token reasoning, (2) high computational costs due to quadratic attention complexity and dense sequence representations after tokenization, and (3) limited multi-scale modeling over user history. In this work, we propose GRACE (Generative Recommendation via journey-aware sparse Attention on Chain-of-thought tokEnization), a novel generative framework for multi-behavior sequential recommendation. GRACE introduces a hybrid Chain-of-Thought (CoT) tokenization method that encodes user-item interactions with explicit attributes from product knowledge graphs (e.g., category, brand, price) over semantic tokenization, enabling interpretable and behavior-aligned generation. To address the inefficiency of standard attention, we design a Journey-Aware Sparse Attention (JSA) mechanism, which selectively attends to compressed, intra-, inter-, and current-context segments in the tokenized sequence. Experiments on two real-world datasets show that GRACE significantly outperforms state-of-the-art baselines, achieving up to +106.9% HR@10 and +106.7% NDCG@10 improvement over the state-of-the-art baseline on the Home domain, and +22.1% HR@10 on the Electronics domain. GRACE also reduces attention computation by up to 48% with long sequences.

Problem

Research questions and friction points this paper is trying to address.

Lack of explicit token reasoning in generative recommendation models

High computational costs from quadratic attention complexity

Limited multi-scale modeling of user history sequences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Chain-of-Thought tokenization with attributes

Journey-Aware Sparse Attention mechanism

Efficient multi-scale user history modeling

🔎 Similar Papers

STORE: Streamlining Semantic Tokenization and Generative Recommendation with A Single LLM