π€ AI Summary
This work addresses a fundamental limitation in autoregressive generative recommender models, where the tree-based decoding structure of semantic IDs induces coupling among item probabilities, hindering the modelβs ability to capture fine-grained user preferences between neighboring items. The study is the first to formally reveal how this hierarchical decoding architecture inherently constrains model expressiveness. To overcome this issue, the authors propose Latte, a novel approach that injects learnable latent tokens before each semantic ID, effectively decomposing the single decoding tree into multiple conditional subtrees and thereby decoupling item generation probabilities. Extensive experiments demonstrate that Latte consistently improves recommendation accuracy, yielding an average gain of 3.45% in NDCG@10 across benchmark datasets.
π Abstract
Generative recommendation (GR) models generate items by autoregressively producing a sequence of discrete tokens that jointly index the target item. However, this autoregressive generation process also induces a structured decoding space whose impact on model expressiveness remains underexplored. Specifically, token-by-token generation can be viewed as traversing a decoding tree induced by semantic ID tokens, where leaf nodes correspond to candidate items. We observe that the item probabilities produced by GR models are strongly correlated with this tree structure: items that are close in the tree tend to receive similar probabilities for any given user, making it difficult to distinguish among them based on user-specific preferences. We further show theoretically that such structural correlations prevent GR models from representing even simple patterns that can be well captured by conventional collaborative filtering models. To mitigate this issue, we propose Latte, a simple modification that injects a latent token before each semantic ID, reshaping the decoding space from a single tree into multiple latent-token-conditioned trees. This design creates multiple paths with varying tree distances between items, relaxing tree-induced probability coupling and yielding an average of 3.45% relative improvement on NDCG@10. Our code is available at https://github.com/hyp1231/Latte.