🤖 AI Summary
Existing generative models (e.g., GANs, VAEs) suffer from training instability or limited expressiveness when modeling user interactions, hindering robust characterization of noisy and dynamically evolving interaction sequences. To address this, we propose DiffRec—the first diffusion-based framework for recommendation systems—modeling interaction generation via iterative denoising to jointly preserve personalization and ensure noise robustness. We introduce two key variants: L-DiffRec, which applies dimensionality reduction in latent space to enhance scalability, and T-DiffRec, which incorporates temporal weighting to better capture dynamic preference evolution. Our method integrates latent-space clustering, time-aware reweighting, and multi-stage noise scheduling. Extensive experiments across three benchmark datasets demonstrate that DiffRec consistently outperforms GAN- and VAE-based baselines under clean, noisy, and dynamic scenarios, achieving average improvements of 12.7% in Recall@20 and 11.3% in NDCG@20.
📝 Abstract
Generative models such as Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) are widely utilized to model the generative process of user interactions. However, they suffer from intrinsic limitations such as the instability of GANs and the restricted representation ability of VAEs. Such limitations hinder the accurate modeling of the complex user interaction generation procedure, such as noisy interactions caused by various interference factors. In light of the impressive advantages of Diffusion Models (DMs) over traditional generative models in image synthesis, we propose a novel Diffusion Recommender Model (named DiffRec) to learn the generative process in a denoising manner. To retain personalized information in user interactions, DiffRec reduces the added noises and avoids corrupting users' interactions into pure noises like in image synthesis. In addition, we extend traditional DMs to tackle the unique challenges in recommendation: high resource costs for large-scale item prediction and temporal shifts of user preference. To this end, we propose two extensions of DiffRec: L-DiffRec clusters items for dimension compression and conducts the diffusion processes in the latent space; and T-DiffRec reweights user interactions based on the interaction timestamps to encode temporal information. We conduct extensive experiments on three datasets under multiple settings (e.g., clean training, noisy training, and temporal training). The empirical results validate the superiority of DiffRec with two extensions over competitive baselines.