Diffusion Recommender Models and the Illusion of Progress: A Concerning Study of Reproducibility and a Conceptual Mismatch

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work identifies a fundamental conceptual mismatch between diffusion models and top-N recommendation: the generative paradigm of diffusion models is incompatible with ranking tasks under implicit feedback, leading to severely degraded generation capability and inflated performance estimates. Method: We systematically reproduce four representative diffusion-based recommendation models published at SIGIR 2023–2024, conducting rigorous benchmarking, ablation studies, hyperparameter re-optimization, and carbon footprint analysis. Contribution/Results: All diffusion models consistently underperform lightweight baselines (e.g., LightGCN). We uncover a systemic “methodological hallucination” in this area—revealing deep flaws including poor reproducibility and inappropriate task modeling. Our findings provide critical methodological reflection and cautionary guidance for the principled application of generative models in recommender systems.

Technology Category

Application Category

📝 Abstract
Countless new machine learning models are published every year and are reported to significantly advance the state-of-the-art in emph{top-n} recommendation. However, earlier reproducibility studies indicate that progress in this area may be quite limited. Specifically, various widespread methodological issues, e.g., comparisons with untuned baseline models, have led to an emph{illusion of progress}. In this work, our goal is to examine whether these problems persist in today's research. To this end, we aim to reproduce the latest advancements reported from applying modern Denoising Diffusion Probabilistic Models to recommender systems, focusing on four models published at the top-ranked SIGIR conference in 2023 and 2024. Our findings are concerning, revealing persistent methodological problems. Alarmingly, through experiments, we find that the latest recommendation techniques based on diffusion models, despite their computational complexity and substantial carbon footprint, are consistently outperformed by simpler existing models. Furthermore, we identify key mismatches between the characteristics of diffusion models and those of the traditional emph{top-n} recommendation task, raising doubts about their suitability for recommendation. We also note that, in the papers we analyze, the generative capabilities of these models are constrained to a minimum. Overall, our results and continued methodological issues call for greater scientific rigor and a disruptive change in the research and publication culture in this area.
Problem

Research questions and friction points this paper is trying to address.

Examining reproducibility issues in diffusion recommender models
Assessing performance gaps between diffusion and simpler models
Identifying mismatches between diffusion models and recommendation tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reproducing diffusion models in recommender systems
Comparing diffusion models with simpler baselines
Identifying mismatch between diffusion and top-n tasks
🔎 Similar Papers
2024-09-06arXiv.orgCitations: 3