π€ AI Summary
This work addresses the lack of reliable estimates for the theoretical accuracy upper bound in sequential recommendation systems, which is fundamentally constrained by the intrinsic predictability of the data and crucial for assessing model performance limits and optimization potential. The authors propose a training-free, candidate-set-agnostic method based on information entropy to provide an unbiased estimate of this upper bound, overcoming the distortion inherent in traditional Fanoβs inequality under low-predictability regimes. The approach enables user-level predictability analysis and efficient data selection, and empirical validation on both synthetic and real-world datasets demonstrates that its estimates align closely with the true task difficulty (Spearman correlation coefficient of 0.914). This framework effectively guides training data curation and reveals heterogeneity in predictability across user groups.
π Abstract
Sequential recommender systems have achieved steady gains in offline accuracy, yet it remains unclear how close current models are to the intrinsic accuracy limit imposed by the data. A reliable, model-agnostic estimate of this ceiling would enable principled difficulty assessment and headroom estimation before costly model development. Existing predictability analyses typically combine entropy estimation with Fano's inequality inversion; however, in recommendation they are hindered by sensitivity to candidate-space specification and distortion from Fano-based scaling in low-predictability regimes. We develop an entropy-induced, training-free approach for quantifying accuracy limits in sequential recommendation, yielding a candidate-size-agnostic estimate. Experiments on controlled synthetic generators and diverse real-world benchmarks show that the estimator tracks oracle-controlled difficulty more faithfully than baselines, remains insensitive to candidate-set size, and achieves high rank consistency with best-achieved offline accuracy across state-of-the-art sequential recommenders (Spearman rho up to 0.914). It also supports user-group diagnostics by stratifying users by novelty preference, long-tail exposure, and activity, revealing systematic predictability differences. Furthermore, predictability can guide training data selection: training sets constructed from high-predictability users yield strong downstream performance under reduced data budgets. Overall, the proposed estimator provides a practical reference for assessing attainable accuracy limits, supporting user-group diagnostics, and informing data-centric decisions in sequential recommendation.