🤖 AI Summary
Existing generative recommendation (GR) methods unify semantic and behavioral signals into discrete tokens under a standard autoregressive (AR) paradigm, yet overlook their intrinsic relationship—semantics explain *why* an item is selected, while behavior reflects *what* is chosen.
Method: We propose Chunk AutoRegressive Modeling (CAR), the first block-level AR framework that jointly models Semantic IDs (SIDs) and User-behavior IDs (UIDs) to emulate the human cognitive process of “thinking first, then deciding.” CAR incorporates a large-language-model-inspired slow-thinking reasoning mechanism and employs a Transformer-based architecture to generate alternating semantic and behavioral token chunks.
Contribution/Results: CAR achieves consistent improvements of 7.93–22.30% in Recall@5 over conventional AR baselines. Crucially, its performance scales positively with increasing semantic information volume, empirically validating both the effectiveness and scalability of the proposed paradigm.
📝 Abstract
Generative recommendation (GR) typically encodes behavioral or semantic aspects of item information into discrete tokens, leveraging the standard autoregressive (AR) generation paradigm to make predictions. However, existing methods tend to overlook their intrinsic relationship, that is, the semantic usually provides some reasonable explainability "$ extbf{why}$" for the behavior "$ extbf{what}$", which may constrain the full potential of GR. To this end, we present Chunk AutoRegressive Modeling (CAR), a new generation paradigm following the decision pattern that users usually think semantic aspects of items (e.g. brand) and then take actions on target items (e.g. purchase). Our CAR, for the $ extit{first time}$, incorporates semantics (SIDs) and behavior (UID) into a single autoregressive transformer from an ``act-with-think'' dual perspective via chunk-level autoregression. Specifically, CAR packs SIDs and UID into a conceptual chunk for item unified representation, allowing each decoding step to make a holistic prediction. Experiments show that our CAR significantly outperforms existing methods based on traditional AR, improving Recall@5 by 7.93% to 22.30%. Furthermore, we verify the scaling effect between model performance and SIDs bit number, demonstrating that CAR preliminary emulates a kind of slow-thinking style mechanism akin to the reasoning processes observed in large language models (LLMs).