On the Equivalence Between Auto-Regressive Next Token Prediction and Full-Item-Vocabulary Maximum Likelihood Estimation in Generative Recommendation--A Short Note

📅 2026-04-17

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the lack of rigorous theoretical justification for autoregressive next-item prediction in existing generative recommender systems. Under the assumption of a bijective mapping between items and k-token sequences, the study formally proves, for the first time, the strict equivalence between k-token autoregressive prediction and maximum likelihood estimation (MLE) over the full item vocabulary. This equivalence holds for both cascaded and parallel tokenization schemes, which represent the two dominant approaches in practice. By establishing this foundational connection, the paper provides a solid theoretical basis for the widely adopted generative recommendation paradigm in industry, revealing the intrinsic consistency between autoregressive training objectives and full-vocabulary MLE, thereby offering critical theoretical guidance for model design and optimization.

Technology Category

Application Category

📝 Abstract

Generative recommendation (GR) has emerged as a widely adopted paradigm in industrial sequential recommendation. Current GR systems follow a similar pipeline: tokenization for item indexing, next-token prediction as the training objective and auto-regressive decoding for next-item generation. However, existing GR research mainly focuses on architecture design and empirical performance optimization, with few rigorous theoretical explanations for the working mechanism of auto-regressive next-token prediction in recommendation scenarios. In this work, we formally prove that \textbf{the k-token auto-regressive next-token prediction (AR-NTP) paradigm is strictly mathematically equivalent to full-item-vocabulary maximum likelihood estimation (FV-MLE)}, under the core premise of a bijective mapping between items and their corresponding k-token sequences. We further show that this equivalence holds for both cascaded and parallel tokenizations, the two most widely used schemes in industrial GR systems. Our result provides the first formal theoretical foundation for the dominant industrial GR paradigm, and offers principled guidance for future GR system optimization.

Problem

Research questions and friction points this paper is trying to address.

generative recommendation

auto-regressive next-token prediction

maximum likelihood estimation

theoretical foundation

Innovation

Methods, ideas, or system contributions that make the work stand out.

auto-regressive next-token prediction

maximum likelihood estimation

generative recommendation

tokenization