Expressiveness Limits of Autoregressive Semantic ID Generation in Generative Recommendation

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses a fundamental limitation in autoregressive generative recommender models, where the tree-based decoding structure of semantic IDs induces coupling among item probabilities, hindering the model’s ability to capture fine-grained user preferences between neighboring items. The study is the first to formally reveal how this hierarchical decoding architecture inherently constrains model expressiveness. To overcome this issue, the authors propose Latte, a novel approach that injects learnable latent tokens before each semantic ID, effectively decomposing the single decoding tree into multiple conditional subtrees and thereby decoupling item generation probabilities. Extensive experiments demonstrate that Latte consistently improves recommendation accuracy, yielding an average gain of 3.45% in NDCG@10 across benchmark datasets.

📝 Abstract

Generative recommendation (GR) models generate items by autoregressively producing a sequence of discrete tokens that jointly index the target item. However, this autoregressive generation process also induces a structured decoding space whose impact on model expressiveness remains underexplored. Specifically, token-by-token generation can be viewed as traversing a decoding tree induced by semantic ID tokens, where leaf nodes correspond to candidate items. We observe that the item probabilities produced by GR models are strongly correlated with this tree structure: items that are close in the tree tend to receive similar probabilities for any given user, making it difficult to distinguish among them based on user-specific preferences. We further show theoretically that such structural correlations prevent GR models from representing even simple patterns that can be well captured by conventional collaborative filtering models. To mitigate this issue, we propose Latte, a simple modification that injects a latent token before each semantic ID, reshaping the decoding space from a single tree into multiple latent-token-conditioned trees. This design creates multiple paths with varying tree distances between items, relaxing tree-induced probability coupling and yielding an average of 3.45% relative improvement on NDCG@10. Our code is available at https://github.com/hyp1231/Latte.

Problem

Research questions and friction points this paper is trying to address.

generative recommendation

autoregressive generation

semantic ID

model expressiveness

decoding tree

Innovation

Methods, ideas, or system contributions that make the work stand out.

generative recommendation

autoregressive decoding

semantic ID